►
From YouTube: RustFest Rome 2018 - Igor Matuszewski: Caging the SpiderMonkey - Ensuring safe JS bindings in Servo
Description
This talk will focus on some of the challenges encountered while working on integrating SpiderMonkey JavaScript engine with the Servo web browser engine (written in C++ and Rust, respectively). We will explore how Rust's rich type system made it possible to enforce many Servo–SpiderMonkey interface rules and safety considerations at compile time and how a custom compiler plugin was developed to verify against project-specific errors at a language level.
https://rome.rustfest.eu/sessions/caging-the-spidermonkey
https://media.ccc.de/v/rustfest-rome-2-caging-the-spidermonkey
A
Sorry
about
the
technical
difficulties,
my
name
is
digger
Matthew,
chef,
Sookie
and
I'm
a
major
or
they
are
less.
However,
today
I'd
like
to
talk
about
something
different,
which
is
integration,
server,
SpiderMonkey
integration,
that
I
did
as
part
of
my
bachelor
thesis
group
project.
So
this
is
the
official
poster
pretty
neat
about
my
friend
of
mine
and
here
I'd
like
to
thank
my
colleagues
we
for
my
work
on
this
project.
Their
names
are
listed
below
and
also
to
Josh
Mathews
from
the
server
team.
That
was
kind
enough
to
be
our
mentor.
A
The
aim
of
this
tag
is
twofold:
the
first
one
is
to
provide
some
insight
into
SpiderMonkey
interns
and
its
challenges,
and
the
second
one
is
to
explore
how
rust
actually
made
the
integration
safer
in
the
process.
So
first
I
will
introduce
served
when
SpiderMonkey
what
it
is
and
how
garbage
collection
comes
into
play.
Then
we'll
delve
deeply
deeper
into
garbage
collection
concerns
and
types
that
encapsulate
them,
and
finally,
we'll
talk
briefly
about
compiler
plugin
that
helps
us
verify
our
custom
logic.
A
So
chances
are
you've,
probably
heard
about
servo.
This
is
an
experimental
browser
engine,
that's
written
in
rust,
and
it
actually
managed
to
bring
some
key
tech
components
back
to
Firefox,
most
notably
stylo,
which
is
a
CSS
engine.
That's
written
in
rust,
but
pro-ukraine
probably
have
not
heard
about
SpiderMonkey,
which
is
a
JavaScript
engine,
that's
written
in
C++
and
right
now
with
powers,
both
servo
and
sorry,
servo
and
Firefox
itself.
A
So
the
main
reason
we
use
rust
is
because
it's
memory
safe.
So
the
question
is
the
natural
one:
when
we
sacrifice
that
memory
safety
when
we
integrate
with
C++
library-
well,
it
turns
out,
we
don't
have
to.
It
is
possible
to
doing
and
to
encode
any
of
the
integration
variants
within
rust
type
system
alone.
A
Now,
let's
take
a
high-level
overview
on
how
the
integration
works
now
javascript
was
designed
to
interact
with
web
browser,
although
nowadays
doesn't
seem
to
care,
but
it
split.
So
this
means
that
web
browsers
exposed
JavaScript
interface
that
can
be
used
to
interact
with
them.
So,
let's
take
a
look
at
a
simple
example
of
how
flow
of
information
may
look
like
so
imagine,
servo
is
given
an
HTML
document
that
is
given
that
is
about
to
process.
So
what
it
does
is
it
extracts
the
script.
A
A
Thus
we
must
employ
some
form
of
automatic
memory
management.
This
means
that
we
don't
have
to
worry
about
borrowed
checker
right.
We
don't
have
to
annotate
our
objects
with
lifetimes,
so
it's
all
good
but
as
it
turns
out,
maybe
not
so
much
because
it
is
costly,
it's
more
complicated
to
do
and
incur
some
runtime
overhead
any
sacrifice
determinism
in
the
process,
so
in
general
there
are
many
ways
to
do
it,
garbage
collection
and
their
different
algorithms,
and
they
can
differ
on
various
axes.
So
we'll
go
into
that
in
a
moment.
A
There
are
two
ways,
most
notably
two
ways
to
attain
and
identify
those
suit
objects.
The
first
one
is
using
stack
Maps,
which
is
where
the
compiler
emits
specific
metadata
about
each
stack
frame.
So
then,
during
the
collection
phase
that
can
walk
the
stack
and
it
can
identify
any
GC
pointer
there.
Well,
the
other
case
is
when
the
runtime
maintains
a
dynamic
collection
of
active
roots.
So
whenever
we
want
to
route
an
object,
we
have
to
add
a
route
to
that
collection.
A
Spidermonkey
uses
the
ladder,
which
is
a
global
stack
of
pointers
to
the
rooted
objects,
and
to
maintain
that
it
actually
introduces
a
stack
allocated,
wrapper
value
that
is
capable
of
routing
some
sort
of
some
set
of
primitive
GC
managed
pointers.
Some
of
them
are
listed
here,
so
it
basically
follows
an
rai
pattern
on
construction.
It
adds
a
DES
pointer
that
it
wraps
around
to
the
collection
and
on
destruction
it
pops
it
up,
and
it
is
common
to
implement
these
route
collection
as
a
linked
list.
A
A
A
However,
it
is
sometimes
undesirable
to
use
that
type
for
performance
reasons,
or
it's
just
straight-out
impossible
to
use
and
to
work
around
that
another
type
is
used,
and
in
this
example,
this
type
is
called
Auto
GC
ruler.
So
in
spirit,
it
is
very
similar
to
the
previous
type
that
I
mentioned.
It
uses
the
same
global
stack
approach.
However,
it
can't
raise
different
things.
A
It
is
internally
intact.
Just
like
imagine,
it's
like
in
rust,
enum
that
executes
a
different
trace
function
depending
on
the
type
of
the
tag.
So,
for
example,
it
is
capable
of
tracing
a
fixed
length
array
of
objects,
and
it
does
that
by
using
the
type
order
array
ruler,
which
is
a
wrap
around
the
outer
GC
ruler
with
a
tag
array,
and
we
can
see
how
this
is
better
than
just
using
the
Rooter
wrappers
before
imagine.
A
We
have
a
collection
of
multiple
elements
and
previously
would
have
to
instantiate
every
rapper
help
type
for
every
element
and
is
structured
itself.
While
this
is
a
constant
time
operation
still,
it
is
actually
more
performant
it
stacks
up.
So
in
this
case
we
only
add
ourselves
to
stack
once
and
with
this
we
are
able
to
trace
through
the
entire
collection.
A
A
So
it
is
convenient
because
in
C++
what
you
can
do
is
you
can
derive
from
the
customary
database
class
and
an
override
the
trace
function
yourself,
and
with
this
instantly
you
will
be
registered
in
in
the
infrastructure
that
can
still
provide
our
own
custom
logic
through
the
through
this
polymorphic
function,
so
being
able
to
trace
custom
objects
is
very
useful,
especially
for
a
server
which
may
not
have
access
to
typical
C++
manatees
that
are
on
the
SpiderMonkey
side.
So
this
will
be
our
goal.
A
We
want
to
create
custom,
rooters
and
rust,
but
still
hook
up
into
the
dynamic
tracing
infrastructure,
that's
over
in
C++
in
SpiderMonkey.
But
this
is
not
as
easy
as
it
sounds.
We
can
actually
very
easily
interact
with
C
F,
Fi
and
rust,
but
not
as
much
when
it
comes
to
inherent
C++
semantics
at
a
spiritual
dispatch,
but
server
already
uses
bhaijan
to
create
rust
bindings
to
the
spider
monkey
and
it
is
capable
of
understanding
some
of
the
C++.
So
you
may
actually
use
that
and
try
and
emulate
simple
as
plus
polymorphism
and
rust.
A
So
our
goal
is
to
create
an
object
and
rust
and
set
it
up
so
that
the
virtual
function
table
is
as
expected
by
the
C++
side.
So,
on
the
left
side,
we
have
a
couple
of
definitions
on
the
left
side,
there's
a
simplified
definition
of
the
customer
reader
class
and
on
the
right
side.
This
is
a
corresponding
structured
data
definition
that's
generated
by
bind
gen.
We
can
also
see
that
binding
was
capable
of
converting
the
implicit
virtual
function
table.
A
That's
in
C++
here
we
can
see
the
trace
function
that
corresponds
to
a
simple
function
pointer
and
the
V.
If
they
will
ask
created
by
binding
so
actually
to
implement
this.
We
define
a
special
trait
that
aims
to
be
to
act,
just
sort
of
a
like
custom
out
of
lured
base
class,
and
to
do
this,
we
use
a
trick
with
an
associated
constant
in
a
trait,
so
the
basic
idea
is
to
create
a
constant
give
table
with
explicitly
instantiated
pointers,
and
in
this
case
we
initialize
the
corner
to
our
rust
method.
A
That's
still
unsaved
an
extern
C
because
it
directly
interfaces
with
C.
However,
what
it
does
is
it
fills
us
a
bit
with
the
implicit
this
pointer
and
what
it
does
next
is
it
calls
the
regular
rust
rate
través
function,
and
you
can
see
here
that
this
is
just
a
regular
self
reference,
as
you
would
expect
in
any
rust
go
to
their
streets.
A
But,
okay,
we
fed
our
foundry
to
aviv
tables
and
this
actually
works.
I
can
assure
you,
but
we
actually
need
to
route
the
data.
That's
the
part,
that's
the
whole
point
of
it
right.
So
we
cannot
there's
one
problem.
We
cannot
directly
translate
the
C++
semantics
because
C++
has
constructors
while
rust
does
not
so
how
it
differs
is
C++
when
it
goes
constructors.
A
It
first
allocates
the
memory
where
the
object
will
be
stored
at
and
then
it
calls
the
constructor
with
the
implicit,
this
pointer,
pointing
to
that
area,
that
region
of
memory
where
the
object
will
reside
and
rust.
We
cannot
do
that.
We
can
only
directly
initialize
values.
There
is
no
way
to
insert
any
special
hook
when
constructing
the
data,
so
when
we
actually
execute
the
constructors.
This
is
what
allows
us
to
economically
insert
the
pointer
to
ourselves
to
direct
route,
our
value
in
a
first
place.
A
If
we
do
that
and
we
create
dangling
pointer,
then
when
the
collector
goes
through
the
routes
it
will
encounter,
a
dangling
pardon
sure
it
crash
so
will
violate
memory
safety
here
during
you
can
see
that
during
the
construction
we
first
use
the
unsafe
function,
that
registers
the
underlying
raw
pointer
to
the
root
stack,
and
then
we
create
this
safe
reference
structure
that
actually
internally
borrows
the
underlying
data,
and
we
can
imagine
drop,
is
very
similar,
I'll
drop.
We
call
an
same
function
that
the
reader
that
the
registers
are
pointer
from
the
read
stack.
A
So
a
moveable
types,
as
this
is
an
actual
precise
nice
nice
use
case
for
pin
racket
types,
but
the
problem
is
we
did
this
when
the
pin
type
was
not
worked
on,
so
we
actually
used
the
simple
for
a
work
round.
Instead,
originally
server
wanted
to
support
the
custom
outer
Rudra
infrastructure
to
use
another
type
called
second
sequence
ruler.
However,
the
type
on
its
own
was
not
pretty
on
a
citas
site,
two
routes,
generic
global
collection,
its
use,
twelve
different
template
instantiations.
A
So,
that's
not
really
that's
not
strong
right,
but
thanks
how
to
rust
rates
compose.
This
is
equivalent
to
those
two
trade
implementations,
which
is
the
implementation
of
tracing
for
option,
T
and
vector
of
T,
where
you
can
actually
implement
the
tracing
logic
for
the
generic
element.
So
at
this
point,
sequence
Rudra
basically
boils
down
to
a
simple
type,
alias.
A
Now,
let's
talk
about
write
barriers,
so
what
is
the
write?
Barrier
in
general?
Barriers
are
a
piece
of
logic:
that's
executed
to
synchronize
some
internal
state
and,
in
our
case,
write
barriers
are
exactly
this
but
executed
at
the
moment
of
write
to
an
address
that
contains
a
GC
managed
reference
and
it
is
needed
to
maintain
specific
invariants
for
different
kinds
of
garbage
collections.
A
In
fact,
we
need
quite
three
barriers
for
incremental
marking
and
write
post
barriers
for
generational
collection,
post
barrier,
meaning
we
execute
the
logic
after
we
do
the
right,
but
currently
server
does
not
support
the
incremental
marketing.
So
we'll
focus
on
the
latter,
which
is
a
generational
collection,
but
first
a
bit
more
about
that
and
in
practice
it
has
been
observed
that
most
of
the
objects
that
are
created
are
short-lived.
A
So
what
we
can
actually
do
with
this
knowledge,
as
we
can
optimize
our
collection
in
SpiderMonkey,
we
split,
we
split
the
heap
and
two
two
and
two
nursery
heap
bandit
anarchy
and
nursery
heap
is
where
the
objects
were.
The
freshly
allocated
objects
end
up,
while
the
tenured
heap
is
when
the
objects
are
moved
to
when
these
survive.
A
collector
pass.
This
allows
us
to
improve
the
collection
phase
by
scanning
only
the
nursery
heap.
However,
there's
a
slight
problem
with
that.
What
if
a
tenured
object
points
to
an
object
and
the
nursery
heap?
A
A
So
the
solution
to
that
is
that
we
want
to
track
the
pointers,
the
values
that
are
pointed
to
from
the
tenure
heap
that
we
sorry
we
want
to
keep
track
of
the
pointed
to
reference
values
in
the
nursery
from
the
trainer
key.
So
in
rust,
we'll
do
it
as
follows:
will
create
a
GC
method
straight
that
aims
to
encapsulate
all
those
white
barriers
concerns,
and
we
will
then
implement
this
for
every
row
of
GC
matters
pointer
in
this
case.
A
It's
worth
noting
that
we
have
a
post
barrier
method
which
takes
an
address
where
the
reference
was
changed.
The
value
of
the
previous
reference,
as
well
as
the
new
one
which
allows
us
to
successfully
factor
changes
and,
for
example,
in
the
case
of
just
regular
JS
object,
we
execute
specific
SpiderMonkey
capi
just
like
below,
but
because
of
our
write
barrier,
we
require
every
mutation
to
go
through
our
specific
set
method
called
well
set.
Essentially,
the
post
barrier
records
the
change
of
value
at
a
given
address.
A
As
you
can
see
here,
we
invoke
the
post
barrier
method
for
the
GC
methods,
70
now,
for
the
logic
to
be
sound,
we
also
need
to
do
the
same
during
the
drop
execution.
Well,
the
heat
preference
goes
out
of
scope.
Spidermonkey
will
not
know
that
the
valley
was
validated
now
and
it
does
not
point
to
anything.
So
what
we
need
to
do
is
we
implicitly
informed
SpiderMonkey
that
the
value
was
set
to,
let's
see
a
null
pointer.
This
clears
reference
to
any
valid
pointer
that
we
may
have
previously
pointed
to
you.
A
Similarly,
to
routing
SpiderMonkey
requires
for
barrier
references
to
be
movable
now.
This
imposes
some
constraints,
for
example,
this
simple
constructor
like
function,
memory
unsafe
and
let's
go
through
how
why
so?
First,
we
stacked
allocate
our
he
proper.
Then
we
call
set
which
invokes
opposed
barrier
taking
the
address
of
the
stack
allocated
he
proper.
But
then
we
return
this
rapper
value
by
move
and
that's
problematic
because
we'll
be
more
value
in
rust.
We
don't
call
the
drop
method,
and
so
we
want
reset.
We
won't
inform
SpiderMonkey
that
the
references
that
the
reference
was
invalidated.
A
Moreover,
we
won't
invoke
any
construction
logic
when
we
move
the
value
out,
so
there
will
be
a
dangling
wrapper
type
at
a
different
address
that
still
points
to
a
may
still
point
to
a
valid
address
in
the
nursery.
So
that's
bad
and
unsafe,
but
an
obvious
improvement
might
be
the
box
all
the
things
to
heap
allocate
first,
and
in
this
case
it
works
because
heap
allocation,
the
memory
allocation
of
the
heap
allocation
of
one
change,
and
here
we
can
actually
safely
move
the
owning
pointer
around.
A
A
Okay,
so
at
the
ends
were
briefly
talked
about
the
compiler
plugin
that
was
developed.
This
is
used
in
a
different
context
to
route
JS
typed
dominant
VI
objects
now
types
change,
but
the
core
principle
stays
the
same.
We
have
a
type
DC
manage
pointers
and
dynamic
routes,
as
I
was
mentioning
previously.
A
Needless
to
say,
these
DC
manage
values
still
need
to
be
routed
correctly,
to
be
used
now
to
verify
that
we'll
use
a
custom
plugin.
This
uses
the
plug-in
register
feature
which
allows
us
to
register
our
own
custom,
attributes,
Lince
and
Warren.
To
perform
the
analysis
we
introduce
two
annotations:
they
must
route
annotation
and
the
allow
unrelated
interior
annotation.
A
So,
as
I
said
before,
the
goal
is
to
verify
that
the
objects
are
routed
correctly.
Now
this
can
be
done
by
imposing
some
certain
set
of
rules.
First
of
all,
any
data
that
has
an
internal
data
marked
with
must
route
must
also
be
marked,
as
must
route
itself,
so
in
other
words,
this
means
that
must
treat
attribute
infects
the
other
type.
A
Secondly,
only
objects
that
are
that
are
marked
with
allow
interior
are
safe
to
contain
the
mask
which
marked
member
data.
We
also
disallow
creating
mic
objects
with
mass
route
on
its
stack,
because
you
can
imagine
a
scenario
where
we
store
an
unrooted
reference
to
ADC
matters
object
on
the
stack,
but
then
the
garbage
collector
does
it
spats
dozen
validating
the
underlying
object,
and
similarly,
we
also
disallow
to
accept:
must
route
annotated
objects
as
function
arguments
type
because
we'd
like
to
have
a
guarantee
that
functions
that
arguments
will
be
valid
and
rooted
throughout
the
entire
function.
A
Call
so
simplifying.
We
do
this.
As
this,
we
invoke
a
specific
special
declare,
lint
macro
with
a
lint
identifier
and
default
severity.
Then
we
need
to
implement
these
fussy
internal
traits.
Now
the
interesting
trade
is
the
light
lint
pass.
It
actually
defines
a
set
of
verification
logic.
That's
executed
for
every
item
in
our
abstract
syntax
tree
of
our
code,
but
in
this
case
we'd
like
to
verify
data
definition,
so
we'll
use
the
check,
structure,
definition
and
check
enum
functions
and
also
we
like
to
verify
function,
bodies
and
their
argument
types.
A
So
you
implement
the
check
function,
call
and
you
can
implement
any
other
call
for
any
other
item,
but
these
are
by
default
lis
implemented.
As
Noah,
so
you
can
only
so
you
can
implement
only
those
that
you
need
so
simplifying
a
great
deal
here.
Imagine
and
the
check
function
method
that
we
want
to
verify
that
these
arguments
have
the
function.
A
Here
we
need
to
pass
the
identifier
balance
and
also
the
range
worthier
occurs
so
consider
this
very
simplistic
example
where
we
reuse
the
plugin
that's
used
in
servo,
we
define
a
very
simple
structure.
That's
marked
with
must
route,
however,
down
below
we
define
a
function
that
accepts
an
argument
type
that
should
be
invalid
and
in
this
case,
we'd
like
to
pick
up
the
air
and
proceed
to
emit
the
appropriate
error.