►
From YouTube: Bay Area Rust Meetup January 2016
Description
Bay Area Rust meetup for January 2016. Topics TBD.
Help us caption & translate this video!
http://amara.org/v/2Fhs/
A
Hello,
everybody
welcome
again.
Oh
sorry,
my
computers,
hi
everybody
welcome
again
to
another
rust
meet
up.
This
is
wonderful
to
see
you
all.
It's
been
a
really
long
time
coming
and
I'm
glad
that
we've
had
so
many
people
show
up
here.
So
tonight's
event
is
all
about
some
of
the
really
need
concurrency
and
parallelism
libraries
that
have
started
to
come
out
of
community,
but
you
know
we're
back
yay,
sorry
that
we
had
such
a
long
break.
I've
already
got
things
lined
up
for
you
know
the
next
couple
months.
A
A
Nico
will
be
talking
about
his
library
round
for
doing
very
simple
threading
Aaron
will
be
talking
about
cross
beam
for
writing,
concurrent
data
structures
and
then
Q
on
will
close
out
the
evening
of
with
simple
parallel
and
then
also
just
want
to
mention
that
I've
already
announced
the
next
rust
meet
up
in
February.
You
should
sign
up,
please
sign
up
for
the
wait
list.
We
had
people
50
people
signed
up
for
the
meet
up
before
I
even
announced
it.
A
B
Hello,
everybody:
you
can
see
I'm
calling
in
from
Boston
it's
a
little
bit
chillier
here
than
I
I
think
it
is
probably
over
there,
but
so
right,
I'm
going
to
talk
to
you
about
rayon,
so
I
guess
we
might
as
well
bring
up
the
slides.
Actually,
hopefully
you
can
see
these
look
good,
okay,
loudly
if
something's
wrong.
It's.
B
Right
so
right
so
I'm
going
to
talk
about
rayon,
which
is
a
library
for
doing
data
parallelism
and
basically
what
data
parallelism
means,
at
least
to
me,
is
you
have
a
lot
of
data
and
you'd
like
to
process
it
in
parallel
all
right-
and
this
is
a
little
bit
disjoint
from
like
task
parallelism,
which
is
kind
of
I,
have
a
lot
of
different
things.
I
want
to
do,
and
I
want
to
do
them
in
parallel,
but
they're
not
operating
something
I
have
a
big
array.
I
want
to
subdivide
or
something
like
that
all
right.
B
So
the
goal
of
rayon
really
is
to
be
able
to
take
sequential
code
that
you
wrote
and
make
it
run
in
parallel,
very
easily
a
sort
of
minimal
effort
and
minimal
risk
that
it's
going
to
do
the
wrong
thing
all
right.
So
here's
an
example
of
a
piece
of
sequential
code
that
iterates
over
a
list,
a
slice
or
a
vector
or
whatever
of
stores
and
for
each
store.
It
calls
compute
price,
which
is
going
to
do
some
computation
and
then
I'm
going.
To
sum
up
the
results
writing
this
is
the
quench
alliteration.
B
So
it
seems
like
it's
pretty
clear
what
it's
going
to
do,
but
if
I
happen
to
know
that,
let's
say
all
these
computations
for
computing,
the
price
can
be
done
in
parallel
independently
with
one
another
I'd
like
it.
If
all
it
took
to
to
make
this
run
in
parallels
to
kind
of
change,
this
dot
itter
to
something
like
dot
pirater
any.
A
B
B
All
right
so
I'd
like
to
be
able
to
just
write,
pirater
and
now
I
would
have
the
same
semantics,
but
it
would
run
in
parallel
right.
So
we
do
each
store.
Well,
let's
say
it
would
potentially
run
in
parallel.
Would
divide
and
the
reason
I
say
potentially
is
that
it'll
depend
a
lot
on
the
computer
you're
using
right.
B
I
guess
is
this
that
it
should
really
be
able
to
enable
parallelism
very
easily
with
minimal
work,
but
give
you
some
ability
to
customize
and
control
it
later
when
you
need
it,
if
you
want
to
get
the
maximum
performance
and
another
important
point
is
you
should
be
able
to
do
parallelism
as
a
kind
of
implementation
detail
and
what
I
mean
by
that
is.
B
If
you
have
some
function,
that's
doing
sequential
work,
I'd
like
to
be
able
to
convert
it
to
do
parallel
work
and
to
do
the
work
in
parallel,
and
none
of
the
caller's
of
that
function
ever
have
to
know
a
signature
of
the
function,
doesn't
change
it's
just
something
that
happens
inside
right.
It
just
gets
done
faster
and
finally,
it's
really
important
to
guarantee
data
race,
freedom
and
what
I
mean
by
this
is
so
I
said
we
want
to
make
pearl,
isn't
easy
and
I
showed
you
an
example
where
it
was
very
ergonomic.
B
You
didn't
have
to
type
a
lot.
That's
not
the
only
factor
in
making
things
easy
right.
You
also
want
to
know
that
when
you
do
this,
your
program
is
not
going
to
suddenly
do
the
wrong
thing
or
undefined
behavior
or
compute
crazy
results.
So
I'd
like
it
that
if
you
say
that
something
is
paralyzed,
abaut
and
you're
wrong,
you
get
a
compilation
error.
That
would
be
the
goal.
B
So,
basically,
if
you
put
all
that
together,
what
it
means
is,
you
should
be
able
to
sprinkle
on
a
willy
nilly,
parallel
hints
and
suggestions
without
a
lot
of
fear
that
something
is
going
to
go
in
this
and
your
program
is
going
to
start
misbehaving.
So
what
ray
on
basically
where
its
current
status
is
I
would
say
it's
still
very
experimental,
I,
don't
think
people
should
build
production
systems
on
it.
However,
it's
pretty
promising
and
I
think
it
delivers
on
these
goals.
B
No
pretty
well,
basically,
and
there's
there's
some
caveats
and
things,
but
basically
it's
able
to
do
all
these
things
that
I'm
talking
about
as
we'll
see
in
this
this
talk,
so
I
showed
you
this
parallel
iterate
PID
grillo
iterator
api,
which
is
something
that's
in
the
parallel
or
in
the
rayon
library.
But
it's
not
the
foundation
around.
It's
actually
just
a
kind
of
layer
built
on
top
of
a
more
primitive
api,
which
is
what
I'm
going
to
spend
most
of
my
time
talking
about
and
that
api
is
called
join.
B
So
the
goal
is,
you
should
build
adjoin.
This
is
kind
of
the
thing
that
you
sprinkle
around
right.
Wherever
parallelism
is
possible
and
then
let
the
library
in
runtime
essentially
decide
when
it's
profitable
to
use
it
so
join,
is
really
well
suited
to
divide
and
conquer
kind
of
algorithms,
you're,
probably
familiar
with
this
term.
B
But
basically
the
idea
is,
you
have
some
big
problem
you
have
to
solve
and
it's
kind
of
intimidating
and
you're
not
sure
how
to
solve
it,
but
you
can
break
it
up
into
two
smaller
problems
and
then
you
can
recursively
try
to
solve
those,
and
if
you
keep
doing
this
eventually,
you
get
down
to
some
really
tiny
problem.
That's
really
easy
right
and
the
idea
is
that
when
you're
doing
this
kind
of
divide
and
conquer
step,
each
time
that
you
you
divide
up
into
two
problems,
you
can
call
join
the
process.
B
Those
two
problems
in
parallel.
So
let
me
show
you
an
example.
This
is
quicksort
my
favorite
sort,
probably
everybody's
favorite
sort
I
imagine
most
of
you
know,
quicksort
or
maybe
more
accurately,
once
new
quicksort
and
since
then
I've
just
used
it
as
a
library.
It's
kind
of
how
I
was,
for
a
long
time,
till
I
wrote
this
blog
post
that
I'm
basing
this
on,
but
so
quick
sort.
Let
me
just
kind
of
briefly
review
how
it
works
real
fast,
it's
a
classic,
divide-and-conquer
kind
of
algorithm.
B
If
we
pick
three
is
the
pivot,
we
would
rearrange
everything,
so
we
have
won
32
on
the
left
and
then
22
and
6
on
the
right,
and
we
have
this
midpoint
that
gets
returned,
which
is
basically,
where
was
the
dividing
line
such
that
everything
to
the
left
was
less
and
everything
to
the
right
was
was
was
greater
and
now
this
is
what
we
can
use
to
divide
up
our
problem
right,
so
we
can
call
split
at
mute,
which
is
a
method
that
takes
one
slice
and
gives
you
back
to
split
around
some
midpoint
and
we'll
end
up
with
here
to
basically
two
slices
now
one
pointing
at
the
left,
half
everything
that
was
less
than
or
equal
to
the
pivot
and
one
pointing
at
the
right
half
everything
that
was
greater
than
the
pivot
and
now
I
can
recursively
process
them.
B
So
when
I'm
done,
I'll
have
first
all
sort
the
left,
half
and
then
I'll
sort
the
right
hand,
okay,
pretty
easy.
So
if
I
wanted
to
paralyze
this
well,
I
can
just
do
what
I
said
and
insert
a
call
to
rayon
join
exactly
at
the
point
where
I
have
two
subproblems.
So
this
is
the
parallel
quicksort.
It
looks
pretty
similar,
except
here,
I'm
calling
join,
and
so
you
see
that
it
took
very
minimal
work
to
do
this,
but
what's
even
more
exciting
in
I.
B
Think
very
cool
about
this
is
that
we
didn't
change
the
signature
at
all,
and
so
we're
actually
doing
a
you're,
not
a
completely
trivial
thing
to
paralyze
here,
because
we
have
mutation
going
on
and
we're
also
potentially
working
with
stack
allocated
data.
These
things
are
usually
a
little
bit
tricky
to
get
right,
because
if
you
start
up
a
thread,
it
runs
asynchronously.
B
With
respect
to
this
person,
the
stack
frame
that
started
it
so
working
with
stack
allocated
data
can
be
a
little
risky
if
you
return
or
do
something
wrong,
you
might
pop
that
data,
while
the
other
thread
is
still
active,
but
the
reason
that
all
this
works
with
rayon
is
that
basically
world
because
join
always
waits
until
both
threads
have
finished
before
it
returns.
You
know
that
it
can
have
safe
access
to
stack
allocated
resources
just
fine,
because
basically
the
stack
frame
isn't
going
to
return
until
both
threads
have
done
so.
B
The
library
kind
of
handles
that
aspect
for
you,
and
that
makes
sense
for,
if
you're,
taking
a
sequential
algorithm,
you
were
going
to
do
both
those
things
anyway,
before
you
return,
so
you
might
as
well
wait
for
the
threads
to
finish
right.
It's
the
same
algorithm
from
the
caller's
point
of
view
and
in
terms
of
mutation.
B
This
works
out
very
well
because
of
Russ
basic
type
system
which
guarantees
you
that
if
you
have
a
mutable
reference
to
something
and
you're
the
only
one
that
has
access
to
that
data,
so
it's
okay
to
put
it
to
another
thread
and
let
the
other
thread
mutate
it
there's.
No
one
else
can
be
reading.
It
will
see
a
little
more
on
that
in
a
bit.
B
I
want
to
start
out,
though
I
want
to
focus
instead
on
talking
about
how
how
does
rayon
decide
when
to
run
things
in
parallel
at
all
right
and
the
technique
that
I'm
using
is
something
called
work
stealing,
which
I
did
not
invent
at
all.
It's
been
around
a
long
time
and
I
think
it
was
invented,
at
least
the
first
time,
I
know
of
it
by
silk.
B
It
was
a
project
at
MIT
from
the
90s
and
that
silk
is
actually
now
available
as
part
of
Intel's
c
compiler,
and
there
have
been
a
number
of
projects
they
have
used
it
in
the
meantime.
So
it's
a
really
time-honored
well
well
respected
technique
and
the
basic
ideas
is
like
this.
Rather
traditionally,
when
you
paralyze,
you
might
take
a
bunch
of
work
and
assign
it
to
different
threads,
but
work
stealing
is
more
of
a
dynamic
load.
Balancing
idea,
all
right,
so
I
have
a
bunch
of
threads
in
a
pool.
B
In
this
case
let's
say:
I
have
two
threads
and
they're
going
to
find
work
for
themselves.
Let
me
show
you
and
if
it's
easier
to
illustrate
it
than
to
describe
it,
so
we
start
out
with
two
threads
here
and
thread.
Be
is
basically
Idol.
Initially,
it
has
no
work
to
do
but
thread
a
starts
out
with
the
job
of
the
bigger
the
big
big
problem.
I
started
with.
So
in
this
case
that's
quick
sort.
That's
what
q
S
stands
for,
calling
quicksort
on
a
slice.
B
Let's
say
it's
22
elements,
long
to
start
right,
then
we
saw
that
what
quicksort
does
once
it
starts
working
on
a
particular
problem.
Is
it
subdivides
the
problem
into
two
two
subproblems
and
then
it
calls
join
on
both
of
them
right
and
what
join
is
going
to
do
basically
is
take
these
two
closures
and
it's
going
to
start
executing
the
first
closure,
but
just
before
it
does
that
it's
going
to
post
a
little
a
little
note
into
a
threadlocal
queue
with
the
second
closure.
B
So
this
queue
is
basically
tracking
work
that
we
plan
to
do
later,
but
we
haven't
gotten
to
it
yet
so
we're
starting
in
on
0
to
15,
and
then
we've
put
in
the
queue
of
15
to
22.
So
we
can
pick
it
up
later
and
when
we
process
0
to
15
will
again
sub
divided
into
two
tasks
and
we'll
put
one
in
the
queue
that's
the
1
to
15
and
we'll
start
in
on
021
right
away.
B
This
is
the
one
length
array,
so
it's
going
to
finish
and
when
it
finishes,
what's
going
to
do,
is
go
over
to
this
queue
and
take
off
the
most
recent
thing
that
it
put
in.
So
it's
actually
a
double
ended
queue
or
a
deck.
You
can
use
it
like
a
stack
or
a
queue
it's
a
minor
morning,
but
that
means
we're
going
to
say:
okay.
Well,
we
finished
0
to
1.
B
B
So
what
threat
B's
going
to
do
is
while
whenever
a
threat
is
idle
in
order,
it
tries
to
find
work
for
it
to
do,
and
it
does
that
by
looking
at
all
the
cues
from
all
the
other
threads.
So
now
it's
wet
be,
might
come
over
and
say
you
know:
I've
got
nothing
to
do
and
thread
a
has
this
this
15
to
22
task
just
hanging
out
and
it's
Q,
maybe
I'll
be
helpful
and
I'll.
B
Take
that
work
on
and
we
call
that
stealing
so
threat
be,
is
going
to
steal
this
task
and
execute
it
and
it's
called
stealing,
but
it's
kind
of
a
weird
sort
of
stealing
it's
like.
If
a
thief
came
into
your
house
and
saw
that
you
haven't
finished
your
dishes
and
started
doing
them
for
you
right,
it's
pretty
nice,
it's
a
nice
kind
of
stealing
so
wet
B
is
going
to
execute
15
to
22
and
it'll.
Do
the
same
thing.
B
It
will
subdivide
this
up
into
two
problems
and
post,
one
of
them
into
its
local
q
and
starting
on
the
first
one
and
it'll
do
that
again,
and
so
it
basically
works
exactly
the
same
way
right.
So
now,
when
it
finishes
a
problem,
it'll
take
something
off
of
its
q,
just
like
thread
a
did,
and
so
let's
say
now
it's
processing
here
now.
If
we
come
back
to
thread
a
for
a
second.
B
B
That
means
three
days
idle
and
when
threads
are
idle,
they
go
look
for
other
work
to
do
so
now
we
can
go
look
at
thread,
B's
q,
and
we
can
say:
hey
here's
some
work,
the
thread
bead
had
started
but
hasn't
finished
yet
18
to
22
I
could
steal
that
back
right
and
so
now
thread
a
is
going
to
do
18
to
22,
while
thread
B
is
over
there,
and
you
can
kind
of
imagine
now.
This
proceeds
like
this
until
all
the
work
is
done,
so
everybody
kind
of
just
does
the
work
they
have.
B
They
create
new
work
and
throw
it
on
their
queue,
and
hopefully
somebody
else
takes
it
before
they
can
even
get
to
it
right.
So
it's
kind
of
like
if
that
thief
came
in
and
was
washing
my
dishes
and
I
see
that
they
haven't
got
gotten
to
a
particular
pan.
I
could
take
it
back
and
start
working
line
everyone's
working
together
to
finish
all
the
problems.
So
what's
nice
about
this,
is
you
get
a
kind
of
dynamic
load?
Balancing
effect,
so
it
might
be.
B
Quicksort
is
not
the
best
example
of
that,
but
if
well
even
here,
you
can
see
actually
the
tasks
don't
all
take
the
same
time
right.
Some
of
them
are
for
big
pieces
of
the
array.
Some
of
them
are
for
smaller
pieces
of
the
array,
and
so,
if
you
have
tasks
that
take
a
variable
amount
of
time,
which
is
often
the
case,
then
it's
really
hard
to
pre
allocate
those
two
threads,
because
you
might
end
up
giving
one
thread
a
lot
of
really
big
expensive
tasks
and
another
thread.
B
A
bunch
of
really
easy
tasks,
and
then
the
other
thread
will
finish
really
fast
and
you
just
won't
get
a
very
efficient
use
of
your
resources
but
with
work.
Stealing,
that's
not
a
big
problem,
because
if
one
thread
finishes
fast,
it'll
just
go,
take
work
from
the
other
threads
and
try
to
take
care
of
that.
B
Instead,
so
here's
a
big
wall
of
code,
this
is
kind
of
the
header
of
join
in
a
little
bit
of
pseudocode
and
the
reason
I'm
showing
you
all
this
code
is
that
I
want
to
kind
of
highlight
some
of
the
rust
features
that
let
let
you
write
something
like
join
as
a
library
and
rust
and
have
it
work
pretty
efficiently,
which
I
think
is
pretty
cool.
So
if
you
remember
joint
aches
to
closures
and
closures
in
rust,
here's
the
arguments
to
join
opera
and
opera,
be
those
are
the
two
operations.
B
Closures
in
rust
are
just
generic
types
that
implement
this
callable
trade
right.
So
here
a
and
B
are
the
types
corresponding
to
the
closures,
and
here
is
where
we
say
that
they
must
implement
the
function,
one
straight,
which
means
that
they
have
to
be
callable
at
most
once
and.
B
The
reason
that
these
are
like
actually
two
distinct
types,
and
why
that's
really
nice
is
it
means
that
every
time
you
call
join,
you
will
get
a
custom
version
of
join
the
compiler
will
not
amorphous
meaning
it
will
create
a
custom
version
of
join
specialized
to
those
particular
closures.
That
means
that
llvm
can
basically
in
line
and
do
optimization
and
hopefully
in
general,
basically
reduce
the
overhead
of
join
in
the
sequential
case.
So
there's
no
virtual
dispatch
there's
only
static
dispatch
and
so
forth.
B
Now
this
is
the
the
kind
of
pseudocode
for
what
I
just
described.
So,
if
you
remember,
we
always
push
something
on
the
queue
and
then
execute
the
other
task.
So
here's
the
the
push
on
to
the
queue
part-
and
the
thing
I
want
to
highlight
here-
is
that
we
don't
do
any
allocation
when
we
push
on
the
queue
we
don't
necessarily
least
in
the
common
case,
the
operable
closure
that
we're
pushing
on
the
queue.
We
can
just
push
a
reference
to
the
closure
that
lives
on
the
stack.
B
So
we
don't
even
have
to
allocate
any
memory
to
do
that
right.
It's
just
a
few
rights,
and
then
we
can
call
the
first
closure
and
then
execute
the
second
closure,
and
each
of
these
calls,
as
I
said,
are
statically
dispatched,
which
means
they
can
be
in
line
very
effective,
very
easily
by
l'a,
vm
and
analyzed
very
easily,
and
then
finally,
here
this
is
just
a
little
loop
that
says
well,
if,
if
somebody
stole
my
work,
then
I'll
help
out
until
it's
ready.
I
mean
this.
B
B
Okay.
The
last
thing
on
this
slide
that
I
just
sort
of
glossed
over
is
this
sand
trade.
So
what
this
is
saying
is,
if
you,
if
you,
if
you
are
familiar
with
rust,
you'll
know,
but
so
send,
is
basically
or
if
you've
used
multiple
threads
in
rust.
You
probably
encounter
it's
and
I
should
say
and
send
is
basically
the
trait
that
identifies
data
types
which
are
safe
to
send
to
another
thread.
B
So
it's
things
which
can
be
safely
transported
between
threads
and
won't
cause
any
databases
right,
and
so,
when
I,
when
I
say
that
the
closures
have
to
be
send.
What
I
mean
is
all
that
data
that
you
use
from
your
closures.
So
in
the
case
of
quicksort,
that
would
be
the
array
that
we're
sorting
and
so
forth
has
to
be
sunday
ball
to
another
thread.
That
makes
sense
because
join
might
cause
those
two
threads
to
execute
in
parallel
or
it
might
not.
B
But
in
the
case
that
it
does,
the
data
had
better
be
safe
to
go
to
another
thread.
Now,
as
it
happens,
in
rust,
most
types
actually
are
send
and
almost
everything
is
sin
double
by
default.
The
only
exception
is
basically
these.
These
three
types
and
things
that
use
these
three
types.
So
if
you
use
the
RC
type
reference
counter
data,
that's
not
thread-safe,
because
the
reference
count
data
doesn't
use
the
correct
atomic
operations
to
increment
the
rest
count,
which
means,
if
two
different
threads
increment
the
ref
count
at
the
same
time.
B
They
might
mess
it
up
basically,
but
you
can
use
the
arc
version,
which
is
the
thread
safe
version
and
it
will
work
just
the
same,
and
the
next
to
ref,
cell
and
cell
are
basically
the
ways
to
introduce
mutation
into
Elias
that
other
a
little
bit
tricky.
B
Suffice
to
say
there
are
thread-safe
equivalents
like
mutex
in
atomic
32,
but
you
have
to
be
a
little
bit
careful.
So
if
you
find
that
you're
getting
compilation
errors
due
to
the
fact
that
you
have
cell
and
ref
cell,
but
that's
kind
of
telling
you
is,
this
code
is
not
going
to
be
trivial
to
paralyze.
So
this
gets
back
to
the
safety
guaranteed.
I
was
talking
about
in
the
first
place,
if
you're,
just
adding
paralyzation
and
you'll
wind
up
getting
errors,
we're
actually
getting
pretty
useful
information.
B
You're
saying
that
this
that
there's
some
shared
state
that
gets
mutated
on
and
you
want
to
think
about
it-
you
could
add
a
mutex
or
some
other
mechanism,
but
at
that
point
it's
no
longer
guaranteed
that
your
parallel
version
is
going
to
behave
the
same
as
your
sequential
version.
So
you
have
to
think
about
it
a
little
bit,
and
so
that's
one
aspect
of
safety.
Basically
that
we
want
to
make
sure
that
you
don't
transport
types
that
are
not
thread-safe
across
threads
and
then
the
other.
B
Is
you
want
to
make
sure
that
you
don't
transport
the
same
data,
especially
mutable
data
across
threads
all
right?
So
this
is
a
version
of
quicksort,
which
is
almost
the
same
as
the
correct
version,
except
that
instead
of
sorting,
though
the
low
and
the
high
on
the
two
different
threads,
it
actually
sorts
the
low
on
both
threads
and.
B
It's
kind
of
it's
kind
of
funny
that
I've
been
using
this
exit
as
an
example
for
for
a
while
and
I
always
thought
well,
I,
don't
know
if
people
really
make
such
a
simple
mistake,
but
then,
when
developing
round
I
actually
made
several
mistakes
exactly
like
this
several
times
and
the
compiler
called
them
for
me
and
I
was
very
happy.
So
what
will
happen?
Basically,
if
you
do
something
like
this,
where
you
accidentally
use
the
same
state
on
both
threads,
is
that
the
compiler
will
complain
and
that's
the
message
you
get
and
what
it's
telling.
B
You
is
basically
that
if
you
want
to
mutate
state
and
rust,
you
have
to
have
unique
access
to
it.
So
here
I
have
two
closures:
they
both
have
access
to
the
same
data,
so
neither
one
has
unique
access
right
and
what's
what's
cool?
Is
that
this?
This
rule
wasn't
designed
with
parallelism
in
mind,
but
it
fits
pearls
and
very
well.
So
the
compiler
isn't
really
doesn't
necessarily
know
that
these
running,
but
it
knows
that
it
needs
to
keep
mutable
access
on
unique,
and
so
it
just
kind
of
falls
out
from
that.
B
So
that's
how
join
works
and
I've
kind
of
dump
through
it
dell
food
in
some
detail.
So
this
api
that
I
showed
you
in
the
beginning
was
a
parallel
iterator
and
I'm
not
going
to
go
into
it
in
this
talk
because
I
don't
want
to
take
a
long
time.
But
basically
you
can
build
up
the
parallel
iterator
in
almost
entirely
safe
code
using
join
and
in
fact
it
was
building
a
car
literary
prize
where
I
made
that
mistake
with
passing
the
same
half
to
both
have
both
closures,
at
least
when
building
around.
B
So
so
that's
pretty
cool
that
the
joint
is
actually
flexible
enough
to
be
used
kind
of
build
up
abstraction
on
top
that
are
that
are
even
easier
to
use
and
apply
to
other
situations.
And
there
is.
I
have
written
up
the
details
of
how
this
works,
at
least
at
a
certain
level,
it's
in
a
blog
post
that
I'll
show
a
link
to
later.
B
So
if
you're
interested,
you
can
go
check
it
out,
but
the
basic
idea
is
that
it
will
divide
up
the
work
that
the
iterator
is
going
to
do
recursively
in
the
same
fashion.
Right
so
in
conclusion,
I
guess:
there's
a
couple
of
points
I
wanted
to
kind
of
bring
across
and
these
the
point
that
I
really
don't
want
to
get
across
here
is
you
should
go
use
rayon.
B
What
I
do
want
to
get
across
is
I
think
there
are
some
some
goals
and
lessons
to
be
learned
from
rayon.
That
I
think
we
should
try
to
incorporate
and
whether
that
I
hope
we
will
end
up
with
a
package
much
like
rayon,
which
is
very
widely
used.
It
doesn't
have
to
be
ran.
I'd
be
perfectly
happy
if
somebody
else
for
a
better
one,
but
I
think
these
are
some
really
important
things
about
it.
B
B
This
kind
of
guarantees
you
want,
but
it's
also
great
for
you.
When
you're
writing
sequential
code,
you're
kind
of
writing
parallel
ready
code
without
even
realizing
it
so
to
speak,
just
because
that's
the
way
that
you
normally
work
to
with
code
and
rust,
and
the
fact
that
we
have
these
lightweight
closures
that
can
be
stack
allocated
that
have
static
linking
and
easy
inlining
also
means
you
can
do
kind
of
just
in
the
compiler.
B
These
are
more
for
people
who
come
and
look
at
the
video
later,
but
this
is
where
the
rayon
project
is
on
github,
and
this
really
long
URL
that
I
had
to
put
in
a
small
font
is
actually
the
blog
post
to
that
was
talking
about
that
kind
of
goes
over
everything
I
said
here,
but
in
somewhat
more
detail,
so
that's
all
I
have
to
say.
Thank
you
very
much.
If
you
have
some
questions,
I
would
love
to
answer
them.
I
guess.
C
B
Well,
you
start
out
with
kind
of
the
queue
has
a
really
big
problems
at
the
top
and
they
get
smaller
and
smaller
as
they
go
down
right
and
so
the
task
when
a
task
goes,
it
always
pulls
from
its
own
cue
from
the
bottom.
So
it
takes
the
smallest
task,
basically
that
the
most
recently
pushed
thing,
which
means
it's
the
most
likely
to
have
the
same
cache
behavior
as
the
thing
you
just
did
right,
so
you
get
kind
of
maximum
locality
that
way.
B
But
when
you
steal
you
steal
from
the
other
side,
which
means
you're
stealing
the
work
that
is
most
remote
from
what
you're
from
what
the
processor
is
doing
right
now
and
thus
the
one
is
kind
of
represents
a
big
chunk
like
this.
The
right
half
of
the
array
that'll
be
kind
of
a
whole
different
cache.
B
You
know
it's
kind
of
just
you
would
get
the
minimum
benefit,
basically
from
from
being
executed
by
the
original
processor.
However,
I
am
not
sure
I
mean
most
of
the
kind
of
papers
and
things
that
I
saw
that
did
measurements.
I
think
we're
pretty
old,
so
I'm
not
sure
like
modern
architectures
are
different
and
I'm
not
sure
if
anyone's
done
any
recent
studies,
but
I
know
that
the
ones
that
I
saw
they
showed
a
quite
good
cache.
Behavior
nice.
D
B
B
By
default,
yes,
I
have
some
code
paths
in
there.
That's
not
very
well
tested
that
lets.
You
create
additional
threat,
pools
and
in
a
kind
of
dynamically
scoped
fashion,
so
you
can
kind
of
enter
into
a
thread
cool
and
then
the
work
that
executes
within
their
will
go
in
that
separate
thread
pool,
but
in
general
it's
probably
not
a
good
idea.
Unless
you
have
a
specific
reason
that
there
should
be
more
zabal
thread
pulls
it's
usually
a
good
idea
to
share
them,
because
I've
only
got
so.
Many
core
is
essentially
so.
E
So
you're
talking
a
bit
about
the
safety
of
rayon
or
that
having
to
do
with
data
races,
but
you
mentioned
in
passing
that
if
you
use
something
like
a
mutex,
you
can
accidentally
reveal
the
fact
that
these
things
are
running
in
parallel.
Have
you
thought
about
a
way
to
rule
that
out
similar
to
send
I.
B
Haven't
mostly
because
I
think
it's
really
useful,
so
there
are
a
lot
of
parallel
or
there
are
a
lot
of
algorithms
where
you
actually
want
to
communicate
between
the
parallel
workers.
So
a
simple
example
is
a
kind
of
branch
and
bound
search
where
your
or
something
like
this,
where
you're
kind
of
searching
through
a
space-
and
you
want
to
know,
what's
the
best
result
that
anyone
has
found
so
far
and
if
you're
running
sequentially,
that's
just
going
to
be.
B
You
know
the
current
what
the
current
weight
is
found
so
far,
but
if
you
do
have
the
opportunity
to
paralyze
you'd
like
to
know
if
someone
else
found
a
better
result
somewhere
else
that
you
can
stop
searching
unproductive
avenues.
So
this
is
used
to
like
solve
Traveling,
Salesman
problem
and
stuff
like
that
right,
so
I'm
not
sure.
That's
a
problem,
basically
that
you
can
reveal
the
Pearl
execution
like.
B
Using
a
mutex
already
you're
already
in
a
parallel
mindset,
you've
already
delimited
your
transactional
boundaries
where
you
acquire
the
lock
and
unlocked
it.
So
I'd
be
surprised
if
it
actually
led
to
bugs
and
code,
but
it
does
seem
like
we
won.
Something
like
send
accepted.
I,
don't
know,
I
haven't
talked
about
it
that
hard,
but
it
could
be
excused.
E
F
B
It
gives
it
gives
all
the
so
this
kind
of
comes
back
to
some
extent
to
the
to
the
shared
threadpool
question
that
was
raised
earlier.
But
so,
if
you
have
multiple
threads
with
different
priorities
and
they're
each
starting
up
parallel
tasks,
we
don't
give
one
thread
pool
higher
part
like
we
don't
give
the
the
tasks
and
the
thread
will
all
have
equal
priority
wherever
they
came
from
it's
an
interesting
idea.
B
Potentially,
the
challenge
is
that
they
can
get
mixed.
They
can
get
mixed
a
like
there's,
there's
one
thread
pool
right,
which
is
basically
totally
different,
set
of
threads
than
the
one
that
are
normally
used
in
your
application.
So
you
can
have
tasks
intermingled
from
different
application
threads
in
the
air
which
may
have
different
priorities,
I
guess
they
could
set
and
unset
the
priority
as
they
go.
E
Hear
me:
cool,
hey
everybody,
I'm
aaron
tron,
another
member
of
the
rest
team
here
at
Mozilla
and
I'm
excited
tonight,
to
tell
you
about
a
library
I've
been
working
on
called
cross
beam,
so
cross
beam
is
targeted
at
sort
of
a
different
layer
of
the
stock
than
the
library
that
we
just
saw
so
Nico's
library
breann,
is
all
about
writing
sort
of
introducing
parallelism
into
your
program
and
it
relies
on
some
underlying
data
structures
like
work.
Stealing
cues
crossbeam
is
targeted
at
those
kinds
of
data
structures.
E
Basically,
so
it's
it's
a
library
for
high
performance,
concurrent
data
structures
and
in
particular
the
the
sort
of
big
ticket
item,
is
locked
free
data
structures
which
also
spend
a
lot
of
time
telling
you
about
tonight,
but
I'm,
also
interested
in
userspace
synchronization,
which
is
closely
related
things
like
semaphores
barriers,
latches
and
so
on,
and
then
a
sort
of
central
topic
that
that
ties
all
this
together,
which
is
concurrent
memory
management.
So
a
lot
of
work
on
lock
free
data
structures
ends
up.
E
This
is
the
amount
of
time
sort
of
perk
you
operation
in
a
multi-threaded
test
case
right.
So
this
is.
This
is
basically
just
showing
like
the
overhead
of
the
memory
management
scheme.
I'm
going
to
show
you
today
is
quite
good
and
in
fact
it's
pretty
easy
to
do
better
than
Java,
so
the
Java
concurrent
link
you
hear
is
highly
tuned.
The
stuffing
crossbeam
is
like
textbook
algorithms,
with
very
little
effort
put
into
them.
Okay,
oh
yeah,.
E
Yeah
I
mean
and
no
yeah
well
well,
we'll
get
into
more
detail
on
this
I
should
say
feel
free
to
ask
questions
throughout.
This
is
definitely
gonna
be
a
little
bit
more
technical
than
the
previous
talk.
Okay,
so
I
realized
that
probably
a
lot
of
you
don't
know
a
lot
about
this
low-level
area
of
concurrency.
So
I
want
to
start
by
talking
about
some
of
the
important
trade-offs
in
that
space,
and
so,
of
course
we
have
to
talk
about
the
cash.
E
That's
that's
always
a
huge
determiner
of
performance
and
the
cash
story
gets
a
lot
more
complicated
once
you
start
talking
about
having
multiple
cores
right,
so
this
diagram
shows
a
fairly
typical
architecture
where
you
have
a
number
of
different
cores
on
the
same
die.
Each
of
them
has
dedicated
l1
cache,
and
then
they
have
a
shared
l2
cache.
E
E
There's
a
protocol
for
doing
this.
A
typical
one
is
the
so-called
messy
protocol
for
cache
coherence,
and
the
idea
is
pretty
simple.
Basically,
each
cache
line
has
a
state
at
any.
Given
time.
Modified
just
means
that
it's
it's
dirty.
Basically,
it
needs
to
be
written
back
to
ram
exclusive
means.
It
hasn't
yet
been
changed,
but
this
core
is
claiming.
It
is
the
only
one
with
a
valid
cache
line
for
this
address.
Shared
means.
Perhaps
multiple
cores
have
this
cache
line
out,
but
none
of
them
is
in
a
position
to
write.
E
So
what
you
want
is
to
be
able
to
access
shared
data
structures
where,
if
you're
accessing
different
parts
of
the
data
structures
say
you
have
a
shared
hash
map
and
one
core
is
looking
up.
One
key
and
another
chord
is
looking
up
a
different
key
and
those
are
in
different
buckets.
Those
cores
should
not
have
to
talk
to
each
other.
They
should
not
be
in
validating
each
other's
cache
lines.
E
Sometimes,
when
you're
working
in
this
space,
you
actually
don't
want
cash
locality,
because
it's
easy
to
have
packed
into
a
data
structure,
some
data
that
is
relevant
to
one
core
and
some
data,
that's
relevant
to
another
they're
sitting
on
the
same
cache
line,
and
now
the
two
cores
are
basically
ping-ponging
the
data
back
and
forth
every
time
they
want
to
write.
This
is
called
false,
sharing.
Okay,
so
that
that's
like
some
sense
of
the
kind
of
space.
E
We
still
up
invalidating
these
cache
lines.
Now,
there's
there
is
a
sort
of
finer
grained
approach
where,
instead
of
having
a
global,
lock
around
the
hash
table,
maybe
you
just
have
locks
around
each
bucket,
but
that's
still
not
as
good
as
you
might
hope.
For
because
again,
when
you're
reading
from
the
hash
table,
you
still
have
to
acquire
those
locks
that
still
involves
a
right
that
can
still
lead
to
cache,
misses.
Ok,
so
I'll
roughly
make
sense.
The
details
here
aren't
super
important.
E
I
just
want
to
give
you
a
flavor
of
the
kinds
of
things
you're
thinking
about.
Ok,
so
all
of
those
concerns
leads
you
in
the
direction
of
something
called
lock
free
data
structures.
So
this
is
basically
I
mean
if
you
think
about
it,
locks
fundamentally
can't
achieve
the
goals
that
we
set
out
in
the
beginning.
They
can't
they
can't
let
you
avoid
doing
a
right
when
all
you
want
to
do
is
read,
so
we
need
some
other
way
to
access
concurrent,
sorry
to
control
concurrent
access
to
a
data
structure
that
doesn't
involve
logs.
E
So
I
want
to
teach
you
briefly
how
to
write
a
lock,
free
data
structure
and
I'm
going
to
use
the
sort
of
hello
world
of
lock
free
data
structures,
which
is
a
stack
and
it's
not
nearly
as
scary
as
you
might
think.
It's
actually
pretty
simple,
so
the
representation
will
use
for
this
stack
is
just
a
simple
linked
list
right.
So
the
the
stack
has
a
head
pointer
pointing
to
the
current
head
and
then
nodes
just
tub
singly
linked
list
all
the
way
to
the
end.
E
So
can
everybody
read
that
it's
kind
of
small?
My
apologies,
if
you
can't
I
so
here's
the
code
for
pushing
on
to
such
a
stack,
so
we
allocate
a
new
load,
a
new
node,
that's
not
real
interesting.
We
turn
that
into
a
raw
pointer,
whatever
those
are
just
some
details
in
rust,
and
then
we
enter
into
a
loop
and
the
idea
with
this
loop
is
we're
going
to
try
to
optimistically
sort
of
install
this
node
onto
the
stack
and
we're
doing
so
in
a
way
that
might
be
racing
with
other
threads.
E
We
might
lose
that
race,
and
so
then
we
have
to
come
back
around
the
loop
and
try
again
and
what
I
mean
by
optimistic
here
is
we're
going
to
take
a
snapshot
of
the
current
head.
So
we
load
the
value
the
current
head
of
the
stock.
That's
that's
some
pointer
and
then
we
go
ahead
and
take
our
node
and
say
its
next
pointer
is
whatever
that
snapshot
was,
and
then
we
try
to
install
that
node
in
the
front
of
the
stock.
E
Basically-
and
we
use
an
operation
called
compare-and-swap
compare-and-swap
is
is
like
the
basic
building
block
of
all
autumn
icity
on
most
processors
and
what
it
says
in
this
case
is
I.
Think
the
current
value
of
head
is
N
and
I
want
the
value
of
head.
Sorry,
I
think
the
current
value
yeah
I've
head
is
head.
Excuse
me,
in
this
case
self
dot
head
is
head
and
I
want
to
update
it
to
N,
and
that
update
should
take
place
in
a
single
atomic
step.
E
As
far
as
all
other
cores
are
considered
right,
so
I'm
not
grabbing
any
lock
here,
I'm
just
doing
the
change
in
place,
but
if
atomically
head
has
some
other
value
than
the
one
I
expected,
then
this
just
won't
do
anything
and
the
result
I
get
back
is
essentially
what
value
did
had
actually
have
so
I
see
I
ask:
is
it
the
value
I
thought?
Is
it
equal
to
head?
If
so,
good
I
succeeded,
I
can
leave
the
loop
I've
actually
installed
my
pointer.
If
not,
then
I
lost
the
race
with
some
other
thread.
E
They
got
their
first
I
have
to
try
again.
Okay.
So
let
me
let
me
show
that
to
you
more
graphically
right
so
say
this
is
the
current
state
of
the
stack
we
have
a
head
pointer.
We've
got
two
nodes,
a
thread
comes
in
and
it
wants
to
push
this
node
with
the
value
seven.
So
it
allocates
that
node.
It
reads
the
current
value
of
head
and
links.
It
links
it
into
the
allocated
note,
but
it
hasn't
yet
changed
the
head
pointer.
Meanwhile,
some
other
thread
comes
along
and
pushes
a
different
node
and
right.
E
So
now
we
have
our
local
node
that
has
seven,
but
the
actual
head
pointer
has
changed
at
this
point.
If
we
do
our
compare-and-swap
it's
going
to
fail,
because
we
think
that
we're
pointing
to
the
node
that
contains
three,
but
actually
that's
changed,
that's
good,
because
if
we
actually
succeeded
we
would
have
just
dropped
the
node
with
five
on
the
floor
right.
That's
what
we're
trying
to
avoid.
So
then
we
have
to
retry
we'll
get
a
fresh
snapshot.
Weary
point,
our
tail
pointer,
we
do
in
other
casts,
and
this
time
we
succeed.
J
E
Right
so
if
we
look
back
at
the
representation,
basically
everything
you
need
to
declare
is
in
this
atomic
pointer
aspect
right.
So
that's
that's
enough
to
tell
rust
and
lvm.
You
know
I'm
going
to
be
modifying
this
in
a
sort
of
concurrent
way
in
a
racy
way
right
and
that
that's
what
makes
what
we're
doing
here,
not
a
date
of
race,
but
actually
a
synchronization
race,
which
is
a
good
kind
of
rates.
E
E
So
let's
look
real,
quick,
then
at
pop,
which
is
pretty
similar.
So
in
this
case
we
we
don't
need
to
do
any
allocation
up
front
right,
we're
just
trying
to
pop
a
note
off.
So
we
enter
right
into
our
retry
loop.
We
grab
a
current
snapshot.
If
we
see
that
the
head
is
actually
pointing
to
null,
then
we
have
nothing
to
do.
We
have
observed
a
moment
in
time
where
the
stack
is
empty,
and
so
we
just
return
the
stock
was
empty.
E
There
is
nothing
to
pop,
otherwise
we
take
a
look
at
what
that
the
node
we
just
snap
shotted.
What
we
think
is
the
current
head.
We
look
at
what
its
next
pointer
is
right
and
then
we
go
back
around
and
try
to
change
head
from
the
one
we
read
to
its
next
pointer
and
the
same
story
happens
to
someone
else.
E
If
some
other
thread
pushed
your
pop,
the
node
in
the
meantime,
this
compare-and-swap
will
fail
and
we'll
try
the
whole
thing
again
and
for
people
who
know
Russ
deeply,
you
understand
what
the
pointer
read
stuff
is
doing
here
for
people
who
don't
don't
worry
about
it.
The
key
point,
though,
is
this
implementation.
Leaks,
notes.
Okay,
you
have
to
understand
some
of
the
subtleties
of
rust
to
actually
like
be
able
to
see
that
this
is
happening
but
effectively
we're
using
these
raw
pointers.
These
unsafe
pointers
in
rust
and
those
don't
have
any
automatic
dropping
going
on.
E
We
have
to
actually
get
those
into
some
own
value
in
order
for
them
to
be
deallocated,
so
here
we're
just
reading
the
data
out
of
the
node
and
then
not
doing
anything.
Okay,
we
successfully
d-link
the
node,
but
we
never
actually
free
its
memory.
We
have
a
leak
okay.
Now,
if
this
were
Java,
we'd
be
done
right,
because
the
garbage
collector
has
our
backs
and
that's
great,
but
things
are
not
so
nice
and
rust.
Okay,
so
you
know
you
might
you
might
think?
Well,
okay,
let's
just
the
allocated
here.
E
E
It's
too
good
to
be
true
of
right,
so
the
problem
is-
and
this
is
this
is
quite
subtle,
but
if
we
go
back
to
this
pop
implementation,
there's
something
very
interesting
going
on
here.
So
we
take
this
snapshot
of
the
current
head.
Now
we
have
a
node
in
our
hand,
and
the
moment
after
that
snapshot
that
node
may
or
may
not
be
in
in
the
stack.
We
have
no
idea
right.
This
is
totally
concurrent
we
haven't
acquired
any
locks.
E
No
other
thread
even
knows
we're
here
right,
so
we
have
this
note
in
our
hand,
but
the
moment
after
we
really
have
no
clue
what's
going
on
with
this
node,
but
we're
about
to
follow
its
pointer
and
get
a
field
out
of
it.
If
that
node
was
popped
in
the
meantime
and
someone
freed
it
segfault
right.
So
basically
the
problem
is
there:
are
these
aliases
to
the
nodes
floating
around,
because
other
threads
are
taking
snapshots
and
following
those
pointers
we
have
no
clue.
E
This
is
happening,
so
we
don't
know
when
we
can
actually
safely
deallocate
memory
and
that's
the
fundamental
memory
management
problem
here
now
in
the
GC
world.
Again,
everything
is
great
because
you
know
sort
of
concurrently
at
some
point.
The
GC
will
run
and
will
clean
up
everything
it
will
realize.
There
are
no
snapshots
left,
so
it's
okay
to
actually
deallocate
this,
but
we
have
to
find
some
other
way
to
actually
achieve
this
goal
and
rust
makes
sense
any
questions
cool
okay.
So
how
do
we
do
it?
E
Okay,
so
they're
basically
two
main
observations
to
make
at
this
point.
So
the
first
I've
already
sort
of
mentioned,
like
the
basic
problem
here-
is
yeah.
We
know
that
we're
the
only
thread
that
actually
unlinked
this
node
from
the
data
structure,
but
other
threads
could
have
stale
snapshots
to
that
node
and
they
might
want
to
actually
dereference
those
snapshots.
So
that's
that's
problem
number
one
or
observation
everyone,
but
there's
another
observation,
which
is
that
once
we've
removed
the
node
from
the
data
structure,
no
new
snapshots
are
going
to
be
made
right.
E
The
only
way
to
get
a
snapshot
is
to
start
from
the
head
of
the
data
structure
and
actually
follow
a
pointer
that
reaches
this
piece
of
data,
but
we
just
removed
the
key
pointer
from
head
right.
So
there's
there's
no
way.
Another
thread
could
create
a
new
snapshot,
so
it's
just
a
matter
of
waiting
until
all
the
snapshots
that
are
in
flight
right
now
have
gone
away.
We
just
need
some
way
to
figure
that
out,
okay,
but
the
problem
is
the
whole
point
of
this
exercise.
E
The
whole
reason
we're
doing
lock
free
programming
in
the
first
place
is
to
avoid
coordination
between
threads,
but
now
we
need
to
coordinate
between
threads,
to
figure
out
when
it's
okay
to
actually
deallocate
these
snapshots
right.
So
that's
that's
like
the
the
nut
that
we
really
need
to
get
at.
Okay.
Here
comes
our
salvation.
So
there's
this
awesome
idea
called
epoch
based
memory
reclamation,
and
this
is
the
memory
management
strategy
that
crossbeam
implements
I'm
just
going
to
give
you
the
high-level
intuition
about
how
this
works.
E
I
won't
go
super
into
detail,
but
basically
the
idea
is
that
we're
going
to
have
these
grand
epochs
of
time
that
all
threads,
more
or
less
agree
on
and
as
the
epochs
progress
snapshots
in
old
epochs,
we
know
are
stale
and
are
sort
of
no
longer
a
concern.
Basically,
we
have
some
way
of
telling
all
threads
have
moved
their
epoch
forward
and
now
it's
safe
to
to
remove
this
node,
because
no
new
snapshots
could
have
been
made
right.
That's
that's
like
the
very
vague
intuition.
E
So
the
way
we
do
this
is
there
are
three
epochs
ever
three
is
a
very
interesting
number.
You'll
you'll
see
why
three
a
little
bit
later,
but
just
trust
me.
For
now.
We
have
a
global
counter
that
says
what
epoch
is
zero
one
or
two,
and
then
each
thread
has
a
local
snapshot
of
witchy
pocket
is
and
a
flag.
That
basically
says
is
the
thread
doing
something
with
the
data
structure
right
now
like
is
it
in
the
middle
of
a
pusher
and
for
each
epoch
we
have
a
sort
of
purgatory.
E
So
this
is
like
a
place
to
put
nodes
that
are
no
longer
reachable
from
the
data
structure
in
this
epoch,
but
might
still
have
some
outstanding
snapshots
right,
so
we
can't
actually
free
them.
Yet
we
know
we're
going
to
free
them
later,
but
we
need
to
wait
until
the
epochs
have
progressed
SAR
enough
before
we
can
do
that.
E
Okay,
if
we
want
to
do
an
operation
like
push
or
pop
we,
we
do
this
pinning
operation
that
sort
of
enters
the
epoch
system.
So
we
set
our
threads
active
like
we
say:
hey
we're
doing
an
operation
on
the
data
structure.
We
take
a
look
at
the
current
global
epoch
and
we
copy
it
into
our
threadlocal
snapshot.
At
that
point,
we
get
a
very
strong
invariant
that
says
any
data
that
is
reachable
right
now
from
the
data
structure
will
remain
allocated
until
this
thread
becomes
d
active.
E
E
On
the
other
hand,
if
we
get
into
a
state
where
we've
popped
a
node
and
we
actually
want
to
deallocate
it,
we
don't
deallocate
it
right.
Now
we
put
it
into
the
purgatory
for
this
epoch
and
then
it'll
get
cleaned
up
later.
Okay
and
that's
basically
the
story
from
the
local
threads
perspective
for
doing
an
operation.
Of
course,
eventually
you
want
to
actually
free
some
data
right.
That
is
the
point,
and
so
the
way
you
do
that
is
you
look
at
all
of
the
threads
that
are
involved
with
this
data
structure.
E
Any
of
the
threats
that
are
active
you
check
their
current
epoch
counter.
If
all
of
them
are
at
the
current
global
epoch,
then
you
attempt
to
move
that
global
epoch
forward
by
one
using
a
compare
and
swap
to
make
sure
you
know
you
settle
races
to
collect
and
if
you're
successful,
then
you
can
reclaim
data
from
two
epochs
ago.
E
Okay-
and
this
is
where
the
three
epochs
come
in
so
the
tricky
thing
is,
you
can
end
up
with
active
threads
at
any
time
in
one
of
two
epochs,
basically,
the
current
one
or
the
one
just
behind,
and
the
reason
is
once
we
do
this
compare
and
swap
that's
being
described
on
this
slide.
We
still
have
a
bunch
of
active
threads
that
are
in
the
old
epoch,
but
some
new
threads
could
come
in
and
be
in
the
new
epoch.
E
We
just
moved
to
right,
but
we
know
that
no
active
threads
will
be
in
three
epochs
ago
and
we
can
sort
of
clean
out
their
data.
Okay,
like
I,
said
I'm,
giving
you
a
high-level
description.
Don't
worry
if,
like
you,
don't
100%
followed
its
kind
of
subtle
stuff,
but
at
the
end
of
the
day,
it's
actually
not
that
hard
to
implement
okay,
so
the
algorithm
I
just
described
has
some
really
neat
properties
with
respect
to
the
cache
concerns.
E
I
was
telling
you
about
at
the
beginning
right
so
I,
saying
like
the
whole
game
here
is
we
want
to
coordinate
without
a
lot
of
costs
right.
We
want
to
coordinate
in
the
lightweight
way,
and
so
the
thing
is
we
don't
have
to
collect
data
at
every
operation
right.
We
can
batch
it
up
and
deallocate
it
occasionally
right
when
just
like,
we
were
doing
a
GC
when
we're
sort
of
running
running
low
and
the
nursery
or
something.
E
Moreover,
because
we're
doing
this
infrequently,
the
global
epoch
is
usually
staying
the
same,
which
means
all
those
different
l1
caches
can
have
the
cache
line
with
that
global
epoch
in
the
shared
mode
right
there,
not
in
validating
each
other.
When
you
do
that
that
read
of
the
global
epoch
you're
getting
a
cache
hit
most
of
the
time,
all
right,
it's
pretty
cheap,
you're
threadlocal
epoch
is
usually
in
the
exclusive
mode,
because
other
threads
aren't
reading
it
most
of
the
time.
E
So
that's
cheap
to
write
to
that's,
not
a
cache
miss,
so
the
only
time
you
incur
real
coordination
costs
like
cache
misses
is
when
you
actually
do
a
garbage
collection.
Basically,
and
those
are
infrequent,
we
win.
Okay
and,
moreover,
you
can
apply
a
single
epoch
management
scheme
to
any
number
of
lock
free
data
structures
right.
So
you
don't
even
have
to
do
this
in
a
product
data
structure
way
you
can
sort
of
amortize
it
over
everybody,
which
is
how
cross
beam
is
set
up.
E
Alright,
any
questions
at
this
point,
so
there
are
lots
of
ways
you
could
do
that
for
so
cross.
Beam
works
tweaks,
a
few
details
from
what
I
showed
you
in
cross
beam.
Each
thread
actually
has
a
local
purgatory
and
basically,
when
that
purgatory
gets
too
big,
it
says.
Oh
I
should
probably
try
to
free
this.
This
data,
but
like
there,
you
can
imagine
many
many
schemes
for
doing
this,
and
it's
basically
orthogonal
to
the
main
other
questions.
Yeah.
E
Yeah,
it's
you
don't
have
to
say
ahead
of
time.
It's
basically
there's
there's
like
a
threadlocal
variable
for
enrollment
and
the
crossbeam
API
just
takes
care
of
the
rest.
So
it's
pretty
straightforward
like
basically
once
once
a
thread
has
enrolled
once
then
it's
it's
sort
of
visible
for
epoch
management
forever
thereafter
and
the
enrollment
is
something
that
the
clients
don't
have
to
worry
about
it
all,
like
you,
just
use
the
data
structure
as
normal
and
everything's
taken
care
of
for
you.
E
Okay,
let's
talk
some
rust,
so
the
other
interesting
thing
about
this
library
is
like
or
about
memory
management
particular
is.
How
can
we
take
all
those
ideas
and
wrap
them
up
in
a
nice
rust
API,
ideally
with
this
little
unsafe
code
as
possible-
and
this
is
one
of
the
parts
that
ended
up
being
sort
of
most
exciting
to
me.
E
E
Okay,
as
we'll
see
later,
various
other
api's
take
a
guard.
Is
an
argument,
as
kind
of
evidence
that
you've
correctly
entered
the
epochs
team,
so
you
can't
get
it
wrong
right.
You
can't
interact
with
the
data
structure
unless
you've
advertised
that
you're,
interacting
with
the
data
structure
and
the
right
way.
Okay,
so
that's
part
one
part
two
is
we
have
some
pointer
types
that
are
sort
of
akin
to
the
ones
in
the
standard
library,
but
they're
meant
for
this
style
of
memory
management,
so
we
have
owned
of
tea.
That's
basically
like
box
of
tea.
E
It's
just
a
piece
of
heap
allocated
own
data.
We
have
shared,
which
looks
just
like
a
shared
reference,
except
it's
its
memory
managed
by
this
epoch
scheme
and
we
have
atomic
of
T,
which
is
just
like
atomic
pointer
that
we
saw
earlier.
But
the
really
cool
thing
is.
This
is
the
most
exciting
thing
to
me:
owned
and
shared,
implement
drf
safely,
so
they're
just
totally
normal
pointers.
E
If
you
have
a
shared
pointer
with
a
lifetime
and
you're
in
that
lifetime,
you
can
dereference
it
without
writing
on
safe
code,
even
though,
like
that
memory
is
undergoing
all
this
concurrent
memory
management
right,
so
here's
how
it
works.
The
atomic
type,
so
this
is
the
the
type
that
you
use
for
doing
these
kinds
of
atomic
updates,
has
a
bunch
of
operations,
load,
store,
compare
and
swap,
and
these
interact
with
the
guards
that
I
mentioned
before
in
a
sort
of
interesting
way.
E
So
if
you
load
a
piece
of
atomic
data,
oh
I
didn't
mention
this
before
so
there's
this
whole
business
about
ordering,
like
you,
saw,
relaxed
and
release
before
ignore
all
of
that,
it's
a
whole
nother
dimension
that
you
just
shouldn't
care
about
right
now.
So
putting
that
aside,
if
you
want
to
load
a
piece
of
data
out
of
a
out
of
a
pointer,
that's
managed
by
this
system,
you
have
to
have
a
guard
all
right.
E
You
have
to
have
proof
that
you're
in
the
epoch
scheme
right,
that's
this
ampersand,
take
a
guard
and
what
you
get
back
when
you
load
is
a
pointer
is
as
a
shared
pointer
to
the
data
that
you
were
trying
to
low,
but
that
shared
pointer
here
has
a
lifetime.
That's
tied
to
the
borrowed
guard
that
you
passed
in.
So
this
is
basically
saying
I'll.
Give
you
a
pointer,
backed
like
a
snapshot
that
you
might
want
to
dereference
later
and
that
snapshot
is
valid
for
as
long
as
the
garters
and
that's
exactly
what
we
wanted.
E
Because
I
said
you
know
before
I
said
the
key
invariant
is
once
you've
entered
an
epoch.
You've
pinned
it
nothing
you
can
see,
could
be
deallocated
until
you
exit
right
and
that's
exactly
what
this
is
capture.
It's
saying
for
as
long
as
that
card
is
valid
until
the
point
you've
exited
it
is
valid.
It
is
safe
to
read
this
snapshot
that
you've
just
taken
out
okay
and
then
for
storing.
You
don't
actually
need
this
guard
because
you're
not
getting
something
that
you
are
going
to
dereference.
E
E
G
A
A
You
might
get
into
this
later,
but
I,
I
I'm,
not
sure
purity,
so
this
or
not,
but
as
having
an
outstanding
loan,
does
that
prevent
epochs
from
advancing?
Yes,.
E
E
Okay
yeah,
as
long
as
you
are
holding
on
to
a
guard,
no
memory
is
being
free
right.
This
is
all
about,
but
the,
but
so
the
thing
is
locked.
Free
data
structures
are
non
blocking
data
structures,
their
data
structures,
where
you
expect
to
have
a
quick
interaction
and
then
you're
out
you're
done
right.
There
are
these
tight
little
retry
loops,
so
most
of
time,
you're
sort
of
going
in
and
out
of
epochs
very
quickly,
and
so
you
expect
not
to
hold
up
the
process
of
collection
very
often
right.
E
It
shouldn't
take
that
that
long
for
all
active
threads
to
cycle
out
of
the
operation
they're
doing
you
could
get
it
wrong.
Obviously,
but
if
you
do
you
have
bigger
problems?
Basically,
okay,
so
there's
one
more
operation,
and
this
is
the
one
place
that
you
get
unsafety
cross
beam
which
at
some
point
you
have
to
say,
I,
believe
that
this
piece
of
data
is
ripe
for
deallocation
right.
E
That
state
is
you
know
when
you're
using
these,
these
pointers
like
you,
can
create
a
big
nest
of
aliased
data.
There's
no
way
the
library
can
know
when
you've
actually
like
unlinked
a
piece
of
data
from
the
data
structure
and
should
go
into
the
purgatory.
You
have
to
say
when
that
is,
and
you
can
get
that
wrong
right.
The
name
here
unlinked
is
meant
to
imply
that
you're
making
an
assertion.
When
you
call
this
thing
right.
This
is
safe
to
call
when
you
have
a
shared
pointer.
E
That
you
know
is
no
longer
reachable
from
the
data
structure,
but
that's
generally
not
very
hard
to
find
out.
There's
there's
almost
always
an
obvious
point
where
you've
now
successfully
popped
right
and
that's
when
you
unlink
as
we'll
see
on
the
next
slide.
Okay,
so
here
is
the
pop
code.
I
showed
you
before,
but
now
using
crossbeam.
It
looks
pretty
similar
except
the
first
thing
we
do
before
we
enter
the
retry
loop.
Is
we
actually
pin
any
pot?
Can
we
get
a
guard
back
right?
E
E
Yeah,
I
and
I
haven't
exhaustively
tested
to
see
which
one
is
faster.
It
probably
depends
on
exactly
what
you're
doing,
but
either
is
valid.
This
is
sort
of
the
simplest
thing
to
do
so.
Here's
the
interesting
thing
right
so
now
we're
coming
in
we're
taking
this
name
snapshot
from
head
that
we
took
in
the
original
algorithm,
so
either
we'll
get
that
there
was
some
head
or
it
was
empty,
empty
cases
the
same.
E
If
there
was
some
head
now
we
have
a
snapshot
in
her
hands
and
just
as
before,
we
can
dereference
right
through
that
snapshot
totally
safely
no
problem,
so
we
pull
out
the
next
pointer
out
of
that
snapshot.
That's
fine!
We
try
to
do
our
compare
and
swap
or
arc
as
shared
here
and
by
the
yet.
So
the
API
is
slightly
different
than
the
one
I
showed
you
before.
Here
it
just
returns
a
boolean.
Was
it
successful
or
not?
So
if
it's
successful,
then
we
enter
a
bit
of
unsafety,
which
is
just
to
mark.
E
The
note
is
unlinked
and
then
extract.
The
data
and
that's
it
okay,
so
a
key
observation
here
is
like
I,
walked
you
through
this
complicated
description
of
this
algorithm,
but
as
a
user
of
the
library,
you
don't
have
to
know
any
of
that
stuff
right.
All
you
need
is
this
epoch
pin
guard
on
length
and
that's
basically
it
otherwise.
E
The
code
that
you're
writing
looks
as
if
you
were
programming
in
a
garbage
collected
language
right,
you
can
take
an
algorithm
from
Java,
poured
it
over
to
rust,
sprinkle
in
a
few
on
links
at
key
points
and
you're
good
to
go,
and
you
get
performance,
that's
better
than
Java
when
you
do
so
right.
So
that's
that's
the
sort
of
essence
of
the
memory
management
story
in
and
cross
beam.
Okay,
I'm
going
to
wrap
up
here.
E
So
let
me
just
say
a
few
things
about
what
else
is
going
on
in
crossbeam,
so
the
bulk
of
the
implementation
right
now
is
the
the
epoch
algorithm,
an
API
that
I
showed
you
there's
also
two
multi
producer,
multi
consumer
queues.
One
of
them
has
a
blocking
mode
as
you.
If
you
want
to
block
when
no
data
on
the
queue
is
available.
There's
also
work
stealing
duck
thanks
to
Alex
crane
and
there's
a
scope
thread,
abstraction
which
is
kind
of
random
thing.
It's
a
little
different
than
the
rest
of
these
okay.
E
So
that's
that's
what's
in
it
today,
but
what
I
would
like
to
do
over
time
is
like
I,
said
at
the
beginning,
really
make
crossbeam.
You
know
a
strong
utility
library
that
offers
a
number
of
concurrent
data
structures
in
this
vein,
so
the
most
important
one
to
tackle,
of
course,
is
concurrent
maps
in
current
hash
maps.
In
particular
that's
a
massive
implementation
challenge,
but
in
rust
it's
also
very
interesting,
API
challenge
for
a
fun
exercise,
think
about
taking
the
current
sequential
API
for
hash
map
and
think
about
how
you'd
have
to
change
it.
E
If
you
want
to
concurrent
access,
it's
less
obvious
than
you
might
think,
there's
lots
of
subtlety
there,
so
I'm
really
eager
to
experiment
in
this
space.
But
there
are
lots
of
other
kinds
of
collections
sets
in
bags
that
you'd
like
to
have
as
well
synchronizers
as
I
mentioned
before
and
then
sort
of
looking
far
into
the
future
going
higher
level
and
actually
offering
ways
of
like
doing
composable
concurrency.
E
So
in
my
PhD
work
I
did
something
called
reagents,
which
is
basically
these
kinds
of
block
free
data
structures,
but
with
a
way
that
you
can
compose
their
operations
together,
so
say:
atomically
either
do
this
or
that-
and
I
would
love
to
do
that
someday
and
cross
beam,
but
there's
a
lot
of
work
before
we
get
to
that
stage.
Yeah.
So
that's
that's!
Basically
it
I'm
happy
to
take
more
questions.
E
E
F
D
E
Same
way,
garbage
collection
does
it's
beautiful,
so
basically
so
the
way
that
right.
So
let
me,
let
me
make
sure
everybody
understands
the
the
problem
here
right.
So
the
issue
is,
if
you
take
a
snapshot,
you
have.
You
have
a
pointer
in
hand
now
you're
going
to
do
a
compare
and
swap
based
on
that
point
or
later,
but
it
might
be
in
the
meantime
that
it's
been
deallocated
reused
for
a
different
purpose
and
relinked
back
into
the
data
structure.
Now
your
operation
succeeds
and
it
should
have
failed
and
you've
ended
up
with
something
focus
right.
E
This
can't
happen
with
the
epoch
scheme,
because
that
you
just
read
that
pointer
out
of
the
data
structure
and
you
have
an
epoch
pinned
and
the
basic
invariant
says
that
data
cannot
be
freed
until
you
exit
the
epoch.
So
it's
guaranteed
that
that
address,
won't,
be
reused
or
recycled
or
put
back
into
the
data
structure
until
you've
exited,
at
which
point
it
doesn't
matter,
you're,
not
you're,
no
longer
about
to
do
this.
Cass
and
the
same
thing
happens
in
garbage
collected
languages
like
Java,
basically
they're.
E
Yes,
correct
and
I
yeah
I
think
there's
actually
no
way
to
safely
do
so
and
I,
don't
think
you
really
want
to
it
doesn't
fit
what
you,
what
you're
generally
doing
with
these
data
structures.
J
A
E
That's
a
great
question:
I
think
the
the
cost
of
fiddling
the
epochs
is
basically
trivial
in
that
case
you're,
just
like
incrementing
someone,
yeah
yeah
exactly
and
a
nun
contended
Kaz's.
You
know
pretty
fast
on
like
modern
Intel
chips
but
I,
so
so
that's
one
potential
source
of
overhead.
The
other
is
that
you
have
this
purgatory
so,
instead
of
like
just
freeing
memory
directly,
you're
actually
accumulating
memory
to
free
later
and
there's
some
allocation
involved
there.
E
But
generally,
if
you're,
following
this
scheme,
I
mentioned
earlier,
where
you
just
you
do
that
free
every
time
you
get
up
to
a
certain
size,
you
just
keep
reusing
that
same
purgatory
list
over
and
over.
So
the
allocation
is
amortized,
so
I
think
it's
not
it's
not
too
bad,
but
it
depends.
It
basically
comes
down
to
how
fast
is
a
nun.
Contended
has
on
your
architecture.
E
So
where
was
that?
Oh
that's
a
yacht,
oh
yeah,
it
was
way
earlier.
Okay,
yeah,
so
phasers
are
another
time.
I'm
very
inspired
here
by
java.util
concurrent
I
basically
spent
my
PhD
worshiping
java.util
concurrent.
So
it's
an
abstraction
there
that
is
sort
of
similar
to
a
barrier,
but
it
allows
a
much
greater
range
of
operations.
E
You
can
talk
about
threads
coming
and
going
like
basically
choosing
to
participate
in
the
barrier
and
then
changing
their
mind,
and
so
now
their
account
doesn't
matter
anymore,
and
this
kind
of
thing
is
actually
used
to
implement
some
kinds
of
works,
dealing
schemes
and
other
kinds
of
parallels
and
support.
So
it's
a
building
block.
It's
not
that
important,
but.
E
Neither
so
in
crossbeam
there's
a
purse
Fred
purgatory
plot,
well,
there's
primarily
a
per
thread
purgatory,
which
is
shared
amongst
all
data
structures.
For
that
thread
there
also
has
to
be
a
global
purgatory
as
well,
because
a
thread
can
be
in
the
process
of
shutting
down,
basically
fail
to
advance
the
epoch,
far
enough
to
be
able
to
actually
reclaim
that
data
and
to
avoid
leaking
has
to
go
plus
the
two
Volturi.
How.
D
E
J
I
F
H
H
E
So
funny
enough,
it's
called
magic
move
and
keynote.
So
basically
you
have
two
slides
next
to
each
other.
That
are
mostly
the
same,
and
some
things
are
different
in
it
automatically
animates
them.
It's
wonderful.
H
Like
I'm,
just
like
a
method,
that's
called
compare-and-swap
and
you
said
in
the
beginning
that
your
intention
was
to
create
a
performance
set
of
primitives,
for
you
know,
parallel
parallel
operations
and
end
data
structures.
So
to
what
extent
does
the
method
compare-and-swap
actually
used
like
the
underlying
instruction
set
like
conv
exchange
for
x86,
or
something
like
that
and
I
just
to
build?
On
top
of
that,
because
I
think
x86
has
like
comp
hand
and
all
these
are
hand
over
hand,
locking
or
skip
lists
and
so
on.
E
Okay,
so
for
the
first
part
of
that
question,
compare-and-swap
here
is
just
a
direct
map
to
the
sort
of
basic
hardware
architekten.
So
there's
nothing
really
interesting
to
say
there
I
think
for
more
specialized
operations,
like
I
mean
also
fetch.
An
increment
is
a
really
important
one.
I,
there's,
no
fundamental
reason
that
that
can't
be
incorporated,
I
haven't
really
thought
about
incorporating
blocking
in
into
this
space.
I,
don't
think,
there's
anything
too
difficult
there
like.
E
Basically,
I
don't
think
the
crossbeam
api
would
have
much
it
interesting
to
say,
like
the
key
thing
at
the
end
of
the
day
is
to
know
what
memory
is
being
potentially
has
these
concurrent
snapshots
that
you
have
to
worry
about
and
to
know
when
you
can
mark
it
unlinked
when
you
know
for
sure
that
it's
no
longer
reachable
from
the
data
structure,
so
anyway,
you
have
of
getting
yourself
in
that
situation.
Viet
partially
locks,
some
mixture
of
comparing
swaps
or
other
operations.
It
doesn't
really
matter.
The
algorithm
should
still
work.
I,
see.
H
Parallel
programming
and
I,
some
some
guy
in
Israel,
and
it
has
a
bit
about
saying
that,
on
on
sequential
e,
consistent
architectures,
like
x86
like
versus
armor
week,
weekly
consistent
architectures,
it's
important
to
use
the
underlying
microcode,
or
rather
like
the
instruction,
that's
responsible
for
handling
on
parallel
access
or
locking,
even
in
a
case
where
you'd
normally
use
like
a
block
free
data
structure
for
performance
reasons.
But
I,
don't
I,
read
it
a
long
time
ago.
So
I
don't
really.
I
thought.
E
E
E
Let
me
summarize
it
by
saying:
in
the
C
and
C++
world
right
with
C
and
C++
11,
there
was
an
attempt
to
create
this
memory
model
that
has
these
kinds
of
memory,
orderings
like
relaxed
release,
acquire
sequentially,
consistent
and
so
on,
and
the
point
of
this
was
to
not
do
what
you're
saying
basically
to
let
you
program
at
a
higher
level
of
abstraction,
where
you
can
say
I'm
doing
a
reader
right.
I
need
to
have
this
amount
of
information
about
what
others
it's
going
to
be
published
to
other
course.
E
Basically,
like
is
there
going
to
be
a
global
order
overall,
reads
and
writes
for
this,
or
is
it
okay?
If
you
know
there's
something
more
subtle
going
on
and
then
the
compiler
will
choose
the
best
underlying
operation
on
that
piece
of
hardware
to
do
so
so
for
x86,
for
example,
sequentially
consistent
operations
require
fencing
the
memory
barrier,
but
any
of
the
other
orders
don't.
So
it's
actually
really
to
your
advantage
to
think
carefully
about
when
you
can
get
away
with
something
more
relaxed
right,
but
whatever
you
do.
E
K
A
K
Okay,
hi
I'm
Hyun
I'm
a
volunteer
on
the
rusty
and
I'll,
be
talking
about
my
library,
simple
parallel,
and
so
there's
a
link
at
the
bottom.
If
you
want
to
type
in
it
type
it
in
quickly,
which
is
got
various
links
and
the
slides
and
so
on.
K
Alright.
So
what
is
simple?
Parallel
it's
designed
to
sort
of
be
a
little
bit
ly
crayon
except
worse.
It's
designed
to
be
a
really
a
really
easy
way
to
add
a
whole
pile
of
parallels
into
your
code
without
having
to
think
about
it
at
all.
So
the
main
API
has
got
these
three
functions
for
map
and
unordered
map
and
I
guess
the
best
way
to
to
understand
what
they
do
and
how
the
library
works
is
to
look
and
look
at
an
exam.
K
So
for
this
example,
I've
got
a
program
here,
it's
kind
of
long,
but
the
main
point
is:
it
takes
a
list
of
files
on
its
command
line,
arguments
which
are
images
and
then
we'll
resize,
those
images
to
be
400
by
400
pixels
and
then
write
them
out.
But
of
course,
the
resizing
image
bit.
Isn't
that
isn't
particularly
interesting?
So,
let's
just
focus
on
the
loop
in
in
the
main
function,
which
is
just
this
bit
and
so,
as
I
said,
it's
got
these
this
files
iterator,
which
is
the
list
of
command-line
arguments.
K
It
will
run
through
it
and
resize
each
of
those
images.
This
is
very
sequential.
It's
only
using
one
core
and,
like
my
laptop,
has
eight
cores
and
most
even
phones
nowadays
have
more
than
one
core.
So
this
is
a
waste
of
resources,
so
instead
we
should.
We
should
use
parallel,
ISM
threading,
for
example,
so
I'm
using
the
great
scope
thread
pool
library
here,
which
is
good.
It
gives
you
precise
control
about
exactly
what
you
want
to
run
where,
but
it's
kind
of
annoying
to
use.
K
So
you
first
you
have
to
import
the
crate
and
then
you
have
to
spawn
a
pool,
and
then
you
have
to
avoid
the
problems
with
destructors
by
using
this
scoped
function
and
then,
finally,
you
can
run
over
the
images
and
then
spawn
a
thread
to
will
spawn
a
job
to
resize
each
of
those
images.
So
it's
kind
of
annoying.
It's
a
lot
more
work
than
just
writing
a
for
loop.
So
what?
K
If,
instead
of
just
using
normal
threading,
you
could
use
magic
threading,
so
here's
simple
parallel:
it's
got
this
all
function,
it's
basically
literally
just
a
for
loop
with
the
aerator
and
then
a
closure,
so
that
closure
will
be
run
on
each
of
the
elements
of
the
iterator
and
and
then
do
its
work
and
preferably
in
parallel.
If
you've
got
multiple
cores,
obviously
like
I'm
saying
fast,
but
is
it
actually
fast?
So
I
ran
it
on
a
directory
of
images.
K
The
sequential
version
took
41
seconds,
scope,
thread,
pool
and
simple
parallel.
Both
took
about
10
seconds,
so
we're
saying
it's
more
than
four
times
faster
on
a
machine
with
four
physical
cores
or
eight
with
high
ready.
That's
about
what
you
expect
for
a
like
a
computationally,
intense
task
like
this.
K
So
how
does
it
work
for
the
this
library
is
designed
to
be
safe
and
easy
to
use,
and
so
it
tries
to
imitate
a
full
leaf
as
much
as
possible,
which
is
why
it
takes
an
iterator
as
an
Intuit
errata,
instead
of
just
a
like
a
the
iterator
trait.
This
allows
it
to
buy.
You
can
just
pass
in
a
vector
and
that
will
automatically
be
converted
into
an
iterator.
K
These
iterators
will
then
yield
this
item
type,
which
has
to
be
sin,
which
makes
sense.
It's
get
art
it
gets
passed
across
threads
because
that's
how
the
parallel
ISM
works
and
then
the
function
closure
is
a
function
that
takes
the
item
from
the
iterator
and
it
has
to
be
sync,
which
means
it
can
be
called.
It
can
be
called
concurrently
on
multiple
threads,
so
that
signature
basically
describes
exactly
what
is
going
on
in
terms
of
parallel
ISM
and
what
what
safety
constraints
you
need
to
set
up.
K
I
satisfy,
of
course
this
makes
more
sense
with
a
diagram.
So
here
we've
got
the
iterator.
Take
the
iterator
value
which,
to
the
red
squares,
some
long
sequence
of
things,
then
the
bottom
we've
got
a
thread
pool
in
this
case.
It's
just
two
threads
and
each
of
those
has
a
reference
to
our
closure,
our
funk
equation
and
so
to
start
off.
K
First,
we'll
take
the
first
two
elements
of
the
iterator
and
pass
them
off
to
the
thread
pool
and
then
they'll
run
that
the
functions
will
be
called
concurrently
and
they'll
be
running
in
parallel
and
then,
when
one
finishes,
the
next
element
will
be
pushed
onto
the
thread
pool
and
then
so
on
and
so
forth
until
the
iterator
has
been
exhausted.
At
this
point,
the
four
function
will
return
and
we'll
have
finished
our
for
loop
and
hopefully
run
jobs
very
efficiently
and
quickly,
and
so
now
sharing
and
safety
like
rayon
and
like
cross
beams
scoped
threads.
K
This
is
designed
to
offer
flexible
sharing,
so
you
can
have
data.
That's
you
can
have
references
that
a
point
into
your
main
thread
stack
and
manipulate
them
freely
without
worrying
about
problems.
So
in
this
case
we've
got
a
our
array,
data
of
ten
elements
and
we
run
in
a
for
loop
across
those
ten
elements.
Adding
is
outside
variable
to
each
of
both.
So
this
would
result
in
data
having
one
two,
three,
four,
five,
six,
seven,
eight
nine
ten
in
it
after
the
for
loop
returns.
So
this
is
perfectly
safe.
K
The
fact
we're
using
iterator
and
that
simple
parallel
is
fairly
careful,
means
that
each
of
those
elements
won't
alias,
there's
no
data,
races,
there's
no
mutation
of
variables
that
shouldn't
be
mutated.
This
particular
example
wouldn't
be
very
efficient
because
of
false
sharing
and
so
on.
But
you
know
it's
safe,
so
that's
nice!
K
What
about
if
we
made
a
mistake
if
we'd
flipped
around
outside
and
elam,
so
instead
of
adding
outside
to
each
of
the
elements
of
the
array,
we
were
incrementing
outside
each
time,
so
this
would
then
result
in
outside
having
theoretically,
if
it
ran
properly,
it
would
theoretically
result
in
outside
having
value,
46
or
36.
Something
like
that.
K
But
it's
it
shouldn't
run
because
it's
a
data
race,
so
there's
no
synchronization
in
the
new
in
the
mutation
of
outside.
So
if
we're
running
this
in
parallel,
we've
got
multiple
threads
modifying
this
one
memory,
location
and
there's
no
synchronization,
which
is
literally
the
definition
of
a
data
race.
This
is
undefined,
behavior
and
rust
and
can
lead
to
arbitrary
problems,
let
alone
not
giving
the
expected
answer.
K
Fortunately,
simple
parallel
protects
against
this
problem,
so
this
is
the
error.
The
compiler
emits
it's
saying
that
you
can't
assign
to
data,
because
we've
got
a
fun
equation,
which
is
telling
us
that
we've
got
a
problem
here.
We
can't
you
take
things
we've
captured.
We
can
leave
mutate,
the
things
that
come
in
through
the
closure
argument,
except
if
we've
got
a
mutex
or
a
like
an
atomic
variable.
Of
course,
if
you
like,
if
you
add
your
own
synchronization,
then
it's
fine.
K
But
iterators
can
also
do
more
than
just
a
for
loop,
so
we've
got
a
instead
of
giving
a
useful
error
message
about
any
errors
to
resize
the
image.
How
about
we
just
tell
the
user
how
many
errors
occurred.
So
in
this
case
we
map
the
resizing
over
the
iterator,
and
then
we
filter
out
the
ones
that
are
errors.
We
count
those
ones,
and
so
this
will
again
run
sequentially
and
it
takes
about
40
seconds.
K
It's
not
particularly
different
to
the
full
loop
version,
but
with
simple
parallel
we
can
paralyze
the
map,
so
we
can
do
very
set
up
with
cross
beam,
but
then,
in
the
middle
we've
got
the
map
here.
So
we've
got
a
map
which
takes
files
and
a
closure,
and
then
that
will
return
an
iterator
that
we
can
then
filter
and
count
and
we'll
get
the
same
answer
except
faster.
So
the
reason
we
have
to
have
the
crossbeam
scoped
is
for
the
exactly
the
same
reason
that
stood
thread.
K
Scoped
had
to
be
deprecated
and
eventually
removed,
because
we
can't
rely
on
destructors
to
run,
which
would
allow
us
to
sort
of
leak,
references
and
have
them
still
access.
Still
accessible,
even
after
they've
been
destroyed,
it's
somewhat
annoying.
It
makes
the
the
syntax
here
obviously
much
worse,
but
it's
the
the
pains
we
take
to
safety.
K
One
subtle
tea
with
this
function
is
that
it
will
yield
the
elements
in
the
order
of
the
input
iterator.
So
if
we
have
files
a
B
and
C
and
we
map
over
with
this
will
get
the
resized
errors
but
like
the
result
of
the
resizing
for
a
then
B,
then
C,
but
it
might
be
that
say
a
is
much
larger
than
B
and
C,
and
so
the
resizing
takes
a
lot
longer.
So
it
might
be
more
useful
to
get
the
errors
for
B
and
C
before
a
finishes,
which
is
what
unordered
map
does.
K
K
This
is
fine
for
many
examples
such
as
this
one,
where
we're
just
counting
and
so
in
this
case,
using
unordered
map
might
be
slightly
nicer,
then
using
map,
although
both
of
them
take
about
10
seconds
so
I've,
sort
of
good
in
a
high
level
overview
of
the
API
and
so
I'll
be
now
sort
of
summarizing
the
ugly
bits,
the
bad
bits
and
the
good
bits
of
the
API.
So
I
can
end
on
a
positive
note,
so
the
ugly
bits
are
really
the
internals
are
pretty
ugly.
K
You
don't
want
to
look
at
them
at
all
as
channels
going
everywhere
and
threads
being
spawned
when
possibly
they
shouldn't,
but
that
the
like
the
API
it
exposes
is
nice,
but
it
results
in
quite
a
lot
of
performance
overhead
and
then
there's
confusion
with
panik's,
so
you'll
usually
end
up
if
a
job
panics.
If
one
of
the
closures
you
pass
to
four
or
map
panics,
then
you'll
usually
end
up
with
double
panics
in
your
application
will
abort.
K
K
You've
got
about
half
the
work
going
to
one
thread
and
half
of
we're
going
to
the
other
third,
but
if
you're
using
just
an
easier
bound,
he
can
these
like
steel
elements
off
the
front,
and
so
this
is
much
slower
and
much
much
more
contention
if
you're
trying
to
do
it.
Concurrently,
however,
specialization
should
allow
us
to
sort
of
pull
back
some
of
that
overhead.
So
for
spore,
iterators
and
types
where
you
can
take
work
off
more
efficiently.
K
So,
if
you've
got
a
tree
of
work
where
one
job
will
generate
to
each
of
those
will
generate
two
more
and
so
on
and
so
forth,
like
the
quicksort
example
that
Nico
was
using,
then
you
then
simple,
parallel,
probably
isn't
the
right
library
there's
far
too
much
overhead
for
this.
It
doesn't
use
fancy
work,
stealing,
data
structures
internally
and
so
on,
however,
rayon
exists,
so
you
don't
need
simple
parallel
for
these
things.
That's
means
it's
not
such
a
downside.
However,
if
you've
got
a
flat,
embarrassingly
parallel
work
load,
then
simple.
K
Parallel
is
great,
especially
when
those
work
pieces
are
quite
large.
So,
as
I
said,
there's
quite
a
lot
of
overhead,
but
if
you're
doing
something
like
resizing
the
image
or
downloading
something
off
the
internet,
then
you'll
be
easily
able
to
parallel
lies
the
simple
parallel
and
get
big
benefits
from
it.
K
And,
of
course,
as
the
name
suggests,
it's
simple,
you
can
almost
use
it
by
doing
text
substitution
almost
but
well,
adding
a
few
parentheses
as
well,
but
that's
the
high-level
overview
of
simple
parallel,
easy
parallelization
without
without
much
control
but
designed
for
sort
of
big
blocks,
big,
big
binaries
and
so
on.
Not
fine,
grained,
work,
I
think,
that's
all
the
stuff
I've
got
are
there
any
questions
is
the
link
as
well
to
be
the
slides
and
the
courage
and
so
on
all
right.
A
Any
questions
I,
don't
see
any.
Thank
you
here
on.
Thank
you
very
much
and
thank
you
all
for
coming
tonight.
It
was
great
having
you
all
and
I
will
see
you
in
February
I
think
we
have
the
room
for
a
bit
longer
so
calm
and
mingle.
Sorry,
internet
people
next.