►
From YouTube: RustConf 2017 - Menhir and Friends: the State of the Art of Parsing in Rust by Naomi Testard
Description
Menhir and Friends: the State of the Art of Parsing in Rust by Naomi Testard
In this talk we present Menhir, a very powerful LR parsers generator, and how to use it to write Rust parsers (and lexers). We also show some details of the internals of Menhir’s backend for Rust, as well as a short comparative survey of other similar parsing tools and techniques for Rust out here.
A
Thank
you
so
I'm
Naomi
and
I'm
going
to
talk
about
the
minion
passage
in
the
retro
forests,
but
rust
passing
in
general
and,
more
especially
the
minion.
A
lot
passage
in
return.
So
short,
disclaimer
first
I
put
a
lot
of
things
into
a
description
of
my
conference
in
the
CFP,
because
I
really
really
wanted
my
talk
to
be
accepted,
but
eventually
I
just
got
assigned
half
an
hour
slot,
so
I
will
definitely
not
have
a
time
to
cover
everything,
but
whereas
in
the
resume
so
sorry
about
that,
so
they
start
with
the
beaning.
A
A
It
can
also
produce
code
passes.
It
is
it's
developed
I'd
be
in
here,
which
is
a
French
National,
Research
Institute,
where
I
work
and
it's
very
widely
used
in
the
auto
modes.
I
could
have
liked
at
the
time
to
take
a
bit
much
with
me
and
what
is
a
low
passing,
but
I
would
not
have
a
time
for
that.
So,
let's
just
recap
the
basic.
So
if
you
know
yak
a
low
passing
is
basically
the
kind
of
drama
that
yak
indels.
A
When
you
say
to
the
methane,
which
is
a
finish
date
to
the
methane,
which
is
augmented
with
a
stack
of
states
for
the
contexts-
and
it
has
a
very
useful
property
of
a
run
time-
complexity
being
lying
on
the
size
of
the
inputs
well,
so
a
low
passing
is
better
explain
with
an
example.
So
this
is
an
example
of
a
mania
pacify.
So
if
you
know
yak,
it
would
be,
it
should
be
very
familiar.
It's
really
the
same
as
a
jack.
Fine.
A
A
When
we
include
the
double
%
just
like
Jack
for
absolutely
no
reason,
and
then
we
have
a
list
of
syntax
rules
which
look
roughly
like
BNF
Graham
all
definitions,
so
both
this
kind
of
passwords
kind
of
easy
to
write.
If
you
already
have
a
be
enough
definition
for
your
grandma,
you
just
add
semantic
actions
which
in
our
case,
contain
rows
code.
A
So
this
is
a
valid
mini
trespasser
which
can
be
compiled
down
to
rest
code.
My
work
has
been
to
port
minier
to
rusts,
which
has
been
relatively
easy,
because
many
was
already
supposed
to
support
several
languages.
It
has
a
back
in
fact,
two
buttons
for
each
mo
one,
which
produces
tables
that
encode
with
transition
tables
of
the
automation
and
another
that
encodes
the
automaton
as
a
nest
of
recursive
functions
and
a
bacon
fat.
A
Because,
yes,
by
the
way
many
often
produce
proofed
passes,
which
mean
many
is
not
proved
in
itself.
But
there's
a
technique
here
in
which
a
certified
tool,
a
tool
which
is
itself
verified,
is
used
afterwards
to
check
that
the
produced
automaton
is
indeed
matches
in
the
same
language,
but
the
input
from
our
and
that's
a
technique
which
is
used
in
the
concert
certified
compiler
for
the
C
language,
which
is
many
of
this
technique
to
verify
the
parser
and
it
works
so
many
ways
a
previously
is
of
software.
A
So
what
I've
done
is
essentially
that
I
wrote
a
table
based
back
in
forest
table
base,
because
it's
easier,
maybe
in
the
future
I,
will
try
to
write
a
code
base
back-end.
But
for
now
it's
a
table
base
back
and
it's
pretty
fast
enough.
That's
how
it
looks
it's
ugly.
I
am
itted
most
of
it,
but
you
can
recognize
the
structure
of
Finland
state
machine
table.
A
A
The
matches,
actually,
what
was
matches
actually
do
is
that
they
have
to
unwrap
the
semantic
values
from
the
stack,
because
the
stack
contains
the
data
of
several
types,
so
they
are
tagged
and
those
type
checks
doesn't
actually
have
or
actually
completely
useless.
So
that's
why
the
uppercase
is
unreachable.
Most
modern
was
not
even
modern.
A
Most
parser
generator
just
use
unsafe
code
to
extract
the
data,
but
for
now
it
uses
this
type
check,
which
was
pretty
useful
during
the
development
fan
fan
sorry,
what
triggered
the
original
authors
of
many
years
interest
into
LR
parsers
was
that
we
discovered,
but
those
type
in
dynamic,
type
checks
or
unsafe
code
could
be
eliminated
and
replaced
with
JD
T's.
So
that's
why
they
wrote.
Many
are
in
the
first
place,
but
at
the
time
Oh
camel
did
not
support
cg
ADT's.
Now
it
does,
but
the
backend
of
mini
would
be
too
complicated
to
rewrite.
A
Okay,
so
how
do
we
use
that
code?
Now?
The
packaging
was
really
the
biggest
problem
here,
because
we
have
to
distribute
something
which
is
a
mix
of
camel
and
roast
code,
which
in
return
is
the
generator,
is
written.
Oh
camel
the
runtime
library,
yeah
I
forgot
to
mention
that,
but
on
this
example,
you
can
see
that
the
code
that
will
actually
interoperate
the
table
that'll
actually
execute.
The
automaton
is
not
in
regenerated
output.
It
lives
in
a
separate
crate
which
is
referenced
on
the
top,
which
is
many
runtime
and
it
has
been
to
be.
A
A
The
first
option
was
to
use
a
pam,
which
is
vo,
camel
package
manager,
which
is
obviously
not
a
good
option,
because
when
your
programmer
and
you're
writing
Russ
code-
and
you
want
to
use
manure-
you
shouldn't
have
to
know
anything
about
Oh
camel.
You
shouldn't
have
to
know
how
to
use
Obama
whatever,
and
neither
if
you
are
just
a
rest
user
and
you
use
a
library
and
this
library
itself
uses
many
years.
You
shouldn't
have
to
use
to
install
thermal
package
manager
just
because
some
dependency
of
dependency,
one
set
the
other
option,
is
Jacobean.
A
Jacobean
is
a
feature
of
cargo
that
allows
to
exactly
to
install
packages
by
containing
a
single
binary,
but
this
is
not
sufficient
because
we
still
need
to
install
the
runtime
with
it.
Plus
a
package
can
not
depend
on
the
Kalka
beam,
so
we
would
have
exactly
the
same
problem
if
you
just
use
a
crate
who
itself
depends
on
the
create
using
many,
you
would
have
to
talk
abou
install
yourself,
because
the
the
cargo
packages
cannot
register
this
dependency
option.
3
was
just
a
regular
package,
but
the
main
problem
with
that
is
right
virgin.
A
So
a
cargo
package
which,
in
its
bull
script,
build
Asura
scripts,
will
invoke
the
economic
file
to
build
and
install
the
generator.
But
saga
will
will
run
this
may
file
with
a
prefix,
which
is
a
completely
obscure
location.
So
many
will
be
installed
in
a
completely
obscure
location,
which
makes
invoking
the
generator
binary
very
complicated.
So
the
idea
was
to
write
a
wrapper
crate
which
no
statically
at
compile
time.
A
The
prefix
in
which
one
agenda
generator
is
installed
so
that
it
can
help
you
invoke
it
latigo
that
works
in
practice
in
print
tips
when
you
want
to
use
mean
even
your
package,
you
start
by
stating
that
you're
going
to
use
a
custom,
bull,
scripts
and
but
but
called
sample
scripts
will
use.
Many
has
a
dependency.
So
that's
what
that's
why
we
build
dependency
section
of
the
cargo
terminal
file
is
for
is
for
registering
dependencies
that
are
needed
to
run
the
build
scripts,
not
to
run
your
package
itself
and
in
bad
ball
scripts.
A
You
use
the
menu
crate,
which
is
this
wrapper
library,
and
you
ask
it
to
process
your
paso
file.
That
is,
you
do
not
explicitly
invoke
the
generator
on
the
command
line.
You
do
not
explicitly
invoke
the
generator
binary,
as
you
would
do
with
most
parser
generator.
You
go
through
a
library
which
does
vets
on
the
file
system.
It
looks
like
this.
That
is
when
Iago
will
build
the
dependencies
of
your
package
and
will
build
many.
A
A
Then
it's
all
good,
only
the
generator
will
produce
open.
The
parcel
I
would
put
file
that
will
be
put
in
your
packages
on
out
there,
which
you
can
reference
using
the
environment
viable
outside
here,
so
that
you
can
include
the
generated
file
in
your
code,
like
you,
do
with
most
rust
crates
that
generate
code
and
we'll
see
later
how
how
to
actually
invoke
the
pass
or
embrace
arose
a
virus
code
for
now
there's
a
detail
that
still
has
to
be
covered
regarding
packaging,
which
is
the
runtime.
A
A
It's
quite
complicated
from
the
point
of
view
of
the
point
of
view
of
the
maintaining
of
the
package.
But
what
is
me
it's
slightly?
Ok
from
the
point
of
view
of
the
person
who
died,
what
he
uses,
many
in
its
Ross
project,
it's
a
bit
complicated,
but
once
you
know
how
it
works,
it's
quite
easy
to
get
along
with
it,
and
that
was
my
biggest
concern.
A
It's
completely
transparent
for
the
indirect
use
of
Minea,
but
is
someone
who
does
not
use
many
abut,
uses
in
interest
product
or
create
which
itself
uses
many
for
this
programmer,
it's
completely
transparent.
Alright,
he
needs
to
do
is
have
the
optimal
binary
to
lie
somewhere
in
the
path.
Just
you
just
have
to
install
it
with
your
distribution
back
in
manager
and
that's
all
when
Cal
go
and
also
all
the
rest's,
ok,
so
well
now,
now
we
have
our
pastarino
sub
module
and
we
need
to
invoke
it.
A
This
is
the
lexer
interface
that
minier
expects
vs
the
interface
but
mini
expects
from
elixir.
It
looks
weird
it
does
not
replace
an
iterator,
it's
not
like
an
infinite
iterator.
You
see
that
the
input
function
can
even
return
an
error
either
written
token,
a
company
with
its
location
but
never
say
stop.
A
You
just
give
it
as
an
argument,
an
iterator
of
the
lexus,
and
it
will
convert
it
to
the
right
interface
and
if
the
iterator
was
ever
to
return
known
to
indicate
the
end
of
the
stream,
the
iteration
alexa
would
return
Alexa
guerra
because
since
many
does
not
need
an
x-ray,
an
extra
token
to
detect
the
end
of
a
stream
if
it
reaches
the
end
of
a
stream,
while
it
still
needs
an
input
it.
So
now
you
actually
needed
that
input,
and
then
it's
proved.
The
end
is
pretty
simple.
A
A
This
is
a
very
simple
lecture
with
only
five
token
types
and
identify
you
have
to
can
type
any
identifier
which
is
made
after
of
alphabetical,
alphanumeric
characters,
V
lambda
keywords,
the
opening
or
closing
parentheses
and
the
dots
and
then
I
was
slightly
few
boilerplate
code.
You
just
have
to
add
native
ones
at
the
end
of
the
token,
because
in
your
case,
the
grammar
in
the
example
I've
taken,
the
grammar
would
still
be
ambiguous
without
a
new
F
token.
At
the
end,
so
we
explicitly
edit
and
the
enumerate
is
to
use
the
numbers.
A
A
A
So
let's
try
to
to
make
a
comparative
list
of
differences
between
a
lot
passing
and
combine.
It
was
the
biggest
pros
of
arrow.
Passing.
Is
that
you
can
it's
hard
to
make
mistake
a
lot
passing
is
by
design
on
those
only
grammar
battle
and
ambiguity
and
ambiguous.
So
it
will
check,
during
the
construction
of
the
passer
yokomo,
to
check
that
it
doesn't
contain
any
conflict,
whereas
we've
combined
it
us
it's
easy
to
accidentally
make
an
enum
be
used.
A
Numbers
use
drama
and
you
will
only
notice
it
when
you
debug
some
weird
example
that
does
not
pass
the
way
you
think
it
should
pass
and
that's
the
way
you
will
notice
that
your
grammar
is
ambiguous
and
some
time
you
will
have
to
rewrite
the
entire
grammar
in
order
to
get
rid
of
it
immediately.
So
that's
the
big
advantage
of
a
lot
passing
other
combined
users,
the
other
being
that
they
are
very
fast.
A
A
On
the
other
hand,
the
biggest
disadvantage
of
a
low
passing
is
that
combined.
It's
also
much
much
easier
to
write
like
at
all
passes
are
pretty
easy
to
write
if
you
use
to
writing,
be
enough
Rommels
of
this
kind
of
thing,
but
still
and
even
more
per
second
main
source
of
quite
easy
to
debug,
whereas
in
other
browsers,
when
you
have
a
conflict,
it's
usually
requires
kind
of
a
it
requires
that
you
kind
of
understand
or
a
lot
passing
works
under
the
hood.
A
In
order
to
be
able
to
debug
the
conflicts,
mininum
tries
to
make
it
easier
by
explaining
the
conflicts
in
terms
of
the
drama
or
even
in
terms
of
automaton,
but
it
still
requires
a
bit
of
thinking
to
correct
them
properly
and
last
and
more
importance.
It's
pretty
hard
with
a
lot
passing
to
properly
until
runtime
syntax
errors,
but
is
the
parser
is
correct,
but
you
feel
it
with
certain
inputs.
For
example,
compiler
compiles
a
program
which
is
syntactically
wrong,
you're,
provided
in
an
input
which
is
malformed.
A
Many
introduces
a
flag
which
is
list
errors
by
the
way.
Another
nd
feature
of
the
cargo
package
for
mania
is
a
link,
binary
function
that
you
can
use
in
your
build
dot
array
scripts
and
which
will
create
a
sim
link
to
the
generator
binary
at
the
root
of
your
sorcery,
which
is
cool
during
the
development
of
your
parser,
because
you
can
iterate
quickly
try
new
flags
make
change
to
the
drama
without
having
to
edit
the
build
scripts
and
run
the
whole
cargo
process
at
each
other.
A
So
if
you
use
this,
this
errors,
flag
mania
will
generate
a
file
containing
the
description
of
all
the
possible
states
in
which
anuraag
could
ever
happen.
It
looks
like
that.
Well,
this
is
a
single
entry
in
that
file
and
I
definitely
don't
have
time
to
explain
in
details
what
this
by
descriptions
work.
But
to
me
shortly,
Asian
tree
contains
a
description
of
an
inputs,
that
of
a
kind
of
input
that
could
they
could
lead
to
a
certain
class
of
errors
and
the
user
can
write
the
user
here.
A
I
mean
the
user
of
many
of
the
programmer
will
write.
A
mini
parser
can
write
inverse
file,
detailed
error
messages
when
you
give
that
file
back
to
many
with
another
flag,
which
is
compile
errors.
You
can
also
invocate
through
the
wrapper
library,
with
the
compile
errors
function
and
it
will
produce
another
rest
code
file.
A
A
It's
it's
also
just
like
when
your
food
at
a
1
parser
and
the
big
advantage
it
has
over
menu,
is
that
is
way
way
much
better
integrated
with
the
US
environment.
It
has
a
nicer
syntax,
which
looks
more
closely
to
which
you
know
resembles
more
closely
for
our
syntax,
but
many
other
syntax,
which
is
basically
the
same
as
Jack.
It
gives
nicer
messages
which
integrates
nicely
with
Iago.
A
It
has
make
rows
which
allows
to
abstract
away
repetitive
fragments
of
drama
and
which,
as
far
as
I
understand,
have
more
or
less
the
same
goal
in
expressivity
as
many
as
parameterize
non-terminal,
but
those
only
mostly
work
in
the
current
implementation
of
a
res
back-end.
That
is
very
work,
but
they
require
a
lot
and
a
lot
of
type
of
notations
which
make
it
a
bit
painful
to
use
where
la,
whereas
la
la
pub
as
type
inference
which
we
like
in
many.
A
Ok,
so
there
are
a
few
other
features,
but
I
didn't
have
time
to
to
dig
into,
but
but
I
will
mention
anyway,
and
they
mostly
don't
work
at
the
moment.
But
that's
where
we
are
looking
at
right
now
mini
has
an
accumulation
of
manure
as
an
incremental
API.
That
is,
instead
of
exposing
just
a
single
function
that
you
call
and
either
it
gives
you
a
successful
results
either
in
general.
A
Along
with
a
low
passes,
this
incremental
API
is
implemented
in
the
arrests
back
end
right
now,
but
it's
not
exposed
because
it
exposes
unsafe
internals
of
the
backends.
And
if
you
were
to
use
both
functions
incorrectly,
you
could
trigger
undefined
behavior.
Make
the
pass
afresh
so
this
way
this
should
work
in
one
in
akumal.
A
Again
they
use
g-e-t
to
make
sure
that
these
interfaces,
and
all
correctly,
but
in
rust
we
still
have
to
figure
out
a
way
to
make
it
work
safely
and
the
library's
many
has
Louis
to
write
a
portion
of
drama
definitions
once
and
for
all
in
the
library
that
you
can
reuse
in
all
of
your
passes.
It
should
work,
but
in
sporting,
because
the
standard
library
of
Munir
contains
semantic
actions,
wasn't
written
in
a
camel
and
will
not
work
with
them
any
other
forest
passer
and
there's
probably
a
whole
lot
of
other
issues,
notably
lexical
conventions.
A
If
you
know
the
Yak
syntax,
you
know
that
we
type
annotations
for
terminals
and
non-terminals
output
inside
angular
brackets
and
within
thing.
The
funny
thing
is
that
the
lecture
of
Mineo
is
completely
stupid
and
it's
relies
on
the
assumption
that
a
valid
Oh
camel
type
shall
not
contain
any
angular
brackets.
So
in
order
to
pass
a
type
annotation,
it
will
just
take
the
first
closing
brackets,
but
Russ
types
can
contain
angular
brackets
because
of
the
generic
types.
A
So
if
you
have
a
type
like
that,
Munir
will
interpret
box
expert
without
the
closing
bracket
as
being
the
type
and
it
will
fail.
So
you
can
work
it
out
by
using
your
type,
a
yes
but
it's
cumbersome.
We
have
also
changed
warnings
because
many
expects
non
terminals
to
be
not
to
be
capitalized,
but
each
of
those
end
up
with
a
name
in
the
produce
rest
code
that
rust
expects
to
be
capitalized.
So
either
you
have
many
warning
evil.
You
have
a
rest
warning
depending
on
how
you
name
your
new
names.
A
The
user
is
tagging
of
types
of
styles.
The
finger
was
talking
about.
We
use
less
dynamic
type
checks
when
retrieving
and
pushing
values
back
and
forth
a
lot
of
stack.
This
should
be
done
in
unsafe
or
with
some
kind
of
of
typing
I.
Don't
know
how
exactly
we
could
use
the
rest
itís
time
to
do
that,
but
maybe
it's
possible.