►
Description
Fast, Safe, Pure-Rust Elliptic Curve Cryptography by Isis Lovecruft & Henry De Valence
This talk discusses the design and implementation of curve25519-dalek, a pure-Rust implementation of operations on the elliptic curve known as Curve25519. We will discuss the goals of the library and give a brief overview of the implementation strategy. We will also discuss features of the Rust language that allow us to achieve competitive performance without sacrificing safety or readability, and future features that could allow us to achieve more safety and more performance. Finally, we will discuss how -dalek makes it easy to implement complex cryptographic primitives, such as zero-knowledge proofs.
B
A
A
A
B
A
So
what
is
curve?
Two
five
five
one:
nine
dog
in
order
to
talk
about
our
library,
it's
necessary
to
situate
it
within,
like
the
stack,
essentially
that
it's
sitting
in,
and
so
you
have
your
application
at
the
top,
and
your
application
is
using
some
sort
of
cryptographic
protocol
that
could
be,
for
example,
like
a
signature
or
a
key
exchange.
Or
is
there
knowledge
proof
underneath
that
you
have
an
abstraction
layer,
I
called
a
group
normally
in
cryptography.
A
A
This
also
results
in
like
super
excessive,
the
pasta
so
cryptographers
have
this
thing
where
they
they
tend
to,
like
literally
copy
paste
each
other's
code
around.
They
also
like
this
is
exacerbated
by
a
lot
of
cryptography,
somehow
think
it's
appropriate
to
you
have
like
a
tarball
of
their
code,
unsigned
inside
another
tarball
of
a
benchmarking
library,
and
that's
like
how
you're
actually
supposed
to
get
it
as
an
end
user.
A
It's
just
it's
like
is
mind-boggling
to
me
so
anyway.
This
leads
to
large
monolithic
code
bases
which
are
idiosyncratic
they're
incompatible
with
one
another
in
like
really
hard
to
debug
ways
and
they're,
often
highly
specialized,
to
perform
only
the
single
protocol
that
they're
implementing,
which
is
often
signatures
or
key
exchange,
and
there's
just
no
consideration
that
there's
this
whole.
A
You
know
this
is
a
whole
rest
of
the
field
of
cryptography
and
you
might
want
to
do
something
other
than
these
two
protocols
and
there's
worse,
some
of
the
bugs
I've
personally
seen
in
major
widely
used
cryptographic,
libraries,
which
is
not
gonna,
name
any
names
but
like
using
C
pointer,
arithmetic
to
index
the
array
so
NC.
Just
as
like
a
recap
array,
indexing
works
both
ways
so
taking
the
sixth
element
of
the
array.
A
A
I've
seen
overflowing
signed
integers
in
Z
and
expecting
the
behavior
to
be
seen
or
similar
across
different
platforms
and
different
like
this.
Is
this
like
canonical
undefined
behavior?
Like
you?
Just
don't?
Don't
do
this
and
I
seen
using
basically
untyped
integer
array,
so
in
rust,
but
would
be
like
a
of
32
u8
and
taking
this
like
without
you
know,
using
any
type
system
at
anything
like
that.
Just
saying
that
this
is
the
canonical
representation
of
multiple
things
in
the
library
and
multiple
things
which
are
mathematically
fundamentally
incompatible.
A
So
saying
that,
like
so
when
you
have
like
an
elliptic
curve
point,
you
can
compress
it
usually
to
32
bytes
by
just
taking
either
the
X
or
the
y
coordinate.
So
that
could
be
a
right
if
there
e
2
by
it's
also
a
scalar
like
a
number
could
be
32
bytes,
and
these
are
things
that
are
not
mathematically
compatible.
They
shouldn't
be
switched
and
your
type
system
should
be
protecting
you
against
making
errors
like
this
how's.
A
This
change,
using
pointer,
arithmetic
to
determine
both
the
size
and
the
location
of
right
buffer
and
there's,
there's
still
more
bugs
I
can
keep
going
with.
Like
a
lot
of
horror
stories
of
things
that
I've
seen,
so
we
didn't
want
to
do
this
in
C,
obviously,
so
we
started
working
in
rust
and
the
design
goals
of
our
library,
where
that
it
should
be
usable
for
other
cryptographers
to
implement
their
protocols,
it
should
be
fast
to
write.
It
should
be
essentially
the
same
as
like
writing
a
sage
script.
A
It
should
be
safe,
and
by
that
we
mean
like
multiple
kinds
of
safe.
It
should
be
memory
safe,
type,
safe.
There
I
mean
rust,
has
extra
nice
things.
If
you
build
in
debug
mode,
there's
underflow
and
overflow
protections,
it
should
be
readable,
which
is
like
a
huge
thing,
because
if
you're
copy-pasting
around
all
these
like
assembly
files
and
each
cryptographer,
is
making
all
these
tiny
tweaks
and
changes
and
they're
tarballs
and
there's
no
get
history
and
there's
no
way
to
you
know
know
why
someone
changed
something.
A
You
just
have
this
like
blob
of
unreadable
code
that
takes
forever,
and
it's
not
very
explicit
and
readability
also
implies
that
it
be
should
be
auditable,
which
is
a
huge
thing
for
security-critical
code.
All
of
these
things
are
things
that
we
would
get
from
a
higher
level
memory,
safe,
strongly-typed,
polymorphic,
programming,
language,
okay,
so
with
that
I'm
gonna
turn
it
radio
hair.you,
who
will
start
to
explain
some
is
a
low
level
field,
arithmetic
and
rest
all
right.
Okay,.
B
So,
as
you
saw
in
one
of
the
previous
slides
there's
like
this
table
of
the
different
pieces,
and
since
that's
like
kind
of
large
as
an
example,
we're
just
gonna
go
through
like
one
thing
that
we
we
do
so
we're
gonna,
look
at
how
so,
as
as
part
of
this,
we
have
to
implement
feel
the
rithmetic
for
integers
mod
p.
Where
p
is
this
tune
to
the
255
minus
19
and
just
as
like
a
worked
example,
let's
see
how
that
works.
B
So
we're
trying
to
do
this
only
using
the
operations
that
we
have
available
on
our
CPU,
so
in
order
to
figure
out
how
we're
gonna
do
this,
you
need
to
answer
two
questions.
First,
what
are
our
actual
primitive
operations
and
also
what
does
the
multiplication
look
like?
So
when
you
do
a
multiplication
you're
using
a
fixed
size,
primitive
type,
but
when
you
multiply
numbers
or
integers,
they
will
get
bigger.
So
how
does
that
get
handled?
Basically,
there's
four
possibilities.
B
B
You
know
how
big
you
can
fit
into
that
type.
That's
what
Russ
does
in
release
mode,
there's
also
for
some
things.
You
might
want
saturating
arithmetic
where,
if
you
get
too
big
it
just
clamps
to
the
highest
thing
and
then
the
fourth
thing
that
you
could
do
is
widening
arithmetic,
where
the
result
of
the
multiplication
is
going
to
be
the
next
biggest
type,
and
so
in
rust.
You
have
intrinsic
four
one,
two
and
three.
B
So
if
you
explicitly
want
one
of
those
you
can
pick
it
I'm
not
aware
of
an
intrinsic
for
the
fourth
one,
but
you
can
just
write
it
like
this,
and
so
now
you
might
want
to
know
like
okay.
What
does
this
actually
turn
into?
So,
let's
suppose
that
we're
on
x86
64
there's
a
really
cool
tool
that
you
can
see.
B
God
bulbs,
compiler
Explorer,
so
in
the
top
window,
I've
just
put
a
an
example
of
a
thing
that
does
a
widening
multiplication
of
two
you
64's
to
produce
a
you,
128
output
and
then
in
the
bottom.
It's
produced.
What's
the
actual
assembly
that
this
will
do
and
there's
two
windows,
because
you
can
see
that
LLVM
will
give
you
a
nicer
instruction
on
newer
processors
so
on
the
older
processors.
B
There's
this
mul
instruction,
where
the
inputs
and
outputs
go
into
like
fixed,
predetermined
registers,
so
you'd
have
to
do
a
bunch
of
like
moving
things
around
and
then
on
newer
things
you
can
pick
where
they
go,
but
the
point
is
that
you
can
just
sanity
check
that
this.
This
really
does
turn
into
something
reasonable.
B
So
suppose
that
we
have
this
ability
to
do
multiplication
of
two
64-bit
numbers
into
128-bit
product.
How
are
we
going
to
implement
multiplication?
So
if
you
look
at
the
original
paper,
they
suggest
using
a
radix
to
to
the
51
representation.
So
what
does
that
mean?
It
means
that
you're
going
to
write
numbers
where
the
in
base
2
to
the
50.
Why
so
you're
gonna
get
five
coefficients?
You
might
wonder:
where
does
the
51
come
from?
B
Where
does
the
five
come
from,
so
these
things
are
going
to
be
basically
256
bits
wide,
and
so
you
could
break
it
up
into
four
times
64.
But
if
you
think
about
the
discussion
in
the
previous
talk
about
instructions
per
clock
and
out
of
order
execution,
it's
actually
much
better.
If
not
all
of
the
operations
depend
on
each
other
right.
If
you
have
things
that
are
full
width
and
every
time
that
you
do
an
operation,
there
will
be
a
dependency
between
them
and
it'll
it'll
be
slower.
B
B
Ok,
so
now
you'll
notice
that,
like
our
numbers,
got
bigger
when
we
multiplied
them,
but
we're
supposed
to
be
working
mod
p,
and
so
we
would
like
to
reduce
this
back
to
the
original
size
of
the
inputs.
So
how
do
we
do
that
notice
that
this
Prime
has
a
special
form
since
it's
2
to
the
255
minus
19?
You
know
that
2
to
the
255
is
19
mod
p
and
the
reason
is
that
mod
PP
zero,
so
zero
equals
two
to
the
255
minus
nineteen.
He
bring
the
19
over,
and
so
why
is
this
useful?
B
B
Similarly,
for
this
306
term,
you
can
write
it
as
2
to
the
51
times
19,
and
that
simplifies
into
this
nice
thing,
so
you
can
get
basically
a
pretty
fast
inline
reduction
and
when
you
combine
that
with
the
formulas
on
the
previous
slide,
you
get
this
this,
where
the
the
triangle
below
is
going
to
get
moved
up
into
the
the
upper
part.
So
this
technique
for
doing
really
fast
reduction
mod
p.
Actually,
you
can
trace
the
the
lineage
of
this
all
the
way
back
to
the
15th
century.
B
If
you're
curious,
it
coincides
with
like
the
development
of
early
capitalism
in
Venice.
So,
unfortunately,
we've
now
moved
on
to
Lake
capitalism
and
things
are
not
looking
up,
but
why
don't
we
just
write
this
in
rust,
so
I
put
I
put
some
rust
code
on
the
on
the
on
the
slide:
we're
implementing
Mull
there's
some
weird
lifetime
stuff.
That's
one
of
the
things
that
we'll
get
to
later,
so
just
disregard
that
for
now
I
just
put
it
in
so
that
because
it's
the
real
code
and
I'm
gonna
define
this
little
helper
function.
B
That's
like
my
own
little
intrinsic
for
doing
widening
multiplication.
It's
in
line
always
so
we'll
just
disappear,
and
we
start
off
so
remember
in
the
in
the
previous
slide.
We
had
a
19
times
some
stuff,
but
that
stuff
is
going
to
be
you
128
and
it's
better.
You
know.
Instead
of
trying
to
do
128
bit
multiplication,
you
could
just
do
a
multiplication
by
19
beforehand
and
then
you
just
write
down
that
formula.
B
Now
we
have
this
problem,
which
is
that
these
see
is
that
we're
getting
our
128
bits
Y
right
there,
you
128,
and
not
you
64
s,
and
remember
that
our
original
goal
is
to
try
to
get
to
back
to
a
like
you,
64
5.
So
we
have
to
then
sort
of
reduce
these
these
see
eyes,
and
we
can
do
that
so
I've
written
this
formula.
But
basically
the
idea
is
that
you
take
this
128
bit
value.
B
So
you
know
do
whatever
you
want
with
that
information,
and
once
we've
done
this
first
pass,
we've
now
fixed
all
of
the
the
CIS
to
lie
in
the
in
to
you
64's,
but
now
they're,
not
maybe
as
as
small
as
we'd
like.
So
we
can
just
do
another
carry
pass
and
that's
what
that
field
element.
64
reduced
function
does
it
does
essentially
the
same
thing,
but
it's
less
complicated
because
you
don't
have
to
change
types
and
it
also
will
get
in
line.
So
actually
that's
our
implementation.
B
It's
not.
You
know
the
simplest
thing,
but
it's
not
that
complicated
in
our
actual
code,
there's
a
bunch
of
debug
assertions
to
make
sure
that
all
the
things
that
are
right
size
that
there's
no
possibility
that
we're
like
violating
some
preconditions
that
none
of
the
intermediate
things
can
overflow
whatever,
but
that's
the
actual
code
that
we
have,
and
we
kind
of
just
like
throw
a
yam
at
it
and
see
what
happens
so.
B
A
Turns
out,
it's
actually
really
really
really
fast,
so
entry
55.9
donna
is
an
implementation.
It's
like
optimized
assembly,
it's
what
tor
currently
uses
by
default
and,
as
you
can
see
here
in
the
bread,
our
performance
is
comparable
to
Donna
slightly
better
for
things
like
verification
and
then
also
just
throw
in
like
a
general
like,
so
that
you
understand
like
what
the
numbers
are.
A
Kind
of
supposed
to
be
ring
is
also
a
breast
library
for
it's
a
higher-level
library
than
ours,
so
implementing
protocols
and
ring
does
that
by
wrapping
boring,
SSL's
implementations,
which
are
assembly,
implementations
and
wrapping
that
in
ross.
So
it's
pretty
pretty
fast
so
now
to
talk
about
certain
things
about
Ross
that
we
really
like
and
other
things
that
we
think
could
be
a
little
bit
better.
A
So,
as
obviously
eye
rests,
cogeneration
is
done
by
LLVM,
as
just
showed
it's
really
good
at
generating
code.
It's
not
just
good
at
generating
fast
code,
it's
good
at
generating
safe
code,
so
historically,
there's
a
worry
that
an
optimizer
could,
in
theory,
break
constant
time
properties
of
an
implementation.
So
what
does
this
mean?
A
People
have
essentially,
in
the
past,
said
that
you
can't
use
compilers
to
write
cryptography.
You
have
to
write
handwritten
assembly
because
that's
the
only
way
to
control
what
a
chipset
is
going
to
do.
Not
only
is
that
not
entirely
true,
there
are
chipsets
that
do
weird
things
which
I'll
get
into
later.
A
That's
just
this
is
all
kinds
of
insane,
so
what
does
it
mean
to
say
that,
like
that
code
is
constant
time?
So
there
are
these
things
called
side
channels
and
the
side
channel
is
essentially
a
mechanism
by
which
an
adversary
can
learn
some
sort
of
like
internal
program
State
for
cryptography.
This
is
especially
insidious
because
learning
a
few
bits
of
a
secret.
It
can
often
lead
to
full
key
recovery
attacks.
A
A
So
if
the,
if
statement
is
a
really
small
piece
of
code
and
then
the
then
statement
is
this
like
huge
chunk
of
code,
you
can
basically
load
your
giant
static
string
into
the
cache
and
you
wait
a
little
while
and
just
like
chill,
and
then
you
try
to
access
your
giant
string
again
and
you
time
how
long
it
takes
and,
as
you
saw
in
the
previous
talk,
hitting
different
layers
of
the
cache
and
hitting
memory
instead
results
in
different
timings.
So
that's
a
timing,
decide
channel.
There
are
other
side
channels
you
can
do
like
you.
A
Can
do
differential
power
analysis
where
you
build
up
profiles
of
like
a
different
device
or
a
different
chip
and
like
how
much
power
it
draws
as
its
doing
certain
operations?
But
basically
this
is
all
bad.
You
want
to
write
code
that
does
exactly
the
same
thing
with
respect
to
Secrets
all
the
time.
A
So
to
prevent
this
I,
we
have
put
a
lot
of
effort
into,
as
you
saw
like
propagating
the
carries
in
in
Harry's
code.
There's
all
these
like
crazy
shifting
operators
everywhere
and
crypto
code
just
tends
to
look
weirder
because
of
this
for
ela
vm's
optimizer.
It
turns
out
that
it's
not
breaking
our
code
as
far
as
we
can
tell
for
x86
and
we've
like
sat
there
and
geeked
out
on
this
subway
for
hours
and
hours
and
hours
and
wait
longer
than
I
would
like
to
in
the
future.
A
We're
hoping
that
there
would
be
some
way
that
we
could
do
like
an
LLVM
pass
with
like
sanitizer,
where
we
can
statically
analyze
the
output
assembly,
like
there's
a
trick
for
Vell,
grind
that
Adam
Langley
made
where
you
just
mark
secret
data,
and
then
it
gets
initialized.
And
then,
if
you
ever
try
to
like
index
over
the
data
or
branch
on
it,
the
whole
thing
just
oops
pants,
and
you
know
that
you've
done
something
wrong.
A
B
A
If,
for
example,
it's
mul
operator,
it
will
return
early
if
you're,
multiplying
something
by
zero,
so
I,
actually
just
like
can't
guarantee
that
crypto
works
at
all
on
your
MacBook.
Don't
do
that
so
filippo
it'll
sort!
It
I
had
a
recent
blog
post,
which
is
really
interesting.
I,
think
it
made
like
the
top
reddit
of
like
both
the
golang
one
and
the
rough
one.
A
A
A
A
A
A
So
the
basic
idea
would
be
to
statically
track
the
sizes
of
the
intermediate
values
of
these
like
field
elements
and
use
specialization
to
insert
reductions
when
necessary,
so
it
would
automatically
detect
like
oh
hey,
you
have
like
three
bits
of
carry
space
and
you've
done
like
two
ads
already.
Let's,
like
do
a
reduction
now,
instead
of
like
forcing
users
to
know
when
to
do
it
by
hand.
B
B
So
implementing
these
proofs
are
for
like
people
who
care
there's
noir
style
proofs.
So
there's
a
lot
of
boilerplate
for
when
your
expressions
get
more
complicated,
so
we
made
a
crate
that
has
an
experimental,
zero
knowledge
proof
compiler
implemented
in
Ross
macros,
not
procedural
macros,
just
like
ordinary
macros,
because
it's
worse
that
way
and
it
as
a
user.
It's
kind
of
nice
because
you
just
write
this,
and
that
corresponds
to
that
example.
Above
and
what
does
that
expand
into?
B
A
A
The
problem
is
that,
if
we're
working
in
a
cyclic
group,
X
being
bigger
than
Y
is
the
same
thing
as
X
being
smaller
than
Y,
if
it
wraps
around
the
group,
so
you
have
to
do
what's
called
a
range
proof
which
is
saying,
like
I,
know
an
X
and
it's
between
like
y
and
z,
and
you
can
do
that.
There's
ways
to
do
this
in
zero
knowledge,
giving
away
any
information
about
X
other
than
it
lies
in
the
correct
range.
A
A
So
verification
essentially
amounts
to
checking
a
ring,
the
signature
on
each
digits
proof
and
if
each
digit
is
in
the
correct
range,
the
whole
number
is
in
the
range
we
implemented
like
there's
a
recent
construction
due
to
back
and
Maxwell,
which
determined
that
if
you
do
Borromean
rings
signatures
over
a
ternary
system
that
you
can
share
data
between
the
digits
and
this
off
like
this
ends
up
being
a
lot
more
efficient,
the
name
Borromean
rings.
Signature
is
actually
like
a
pretty
cool
name.
Borromean
rings
are
like,
if
you
imagine
like
the
sign
for
the
Olympics.
A
It's
this
mathematical
like
thing
where
you
have
three
rings
and
they're
interconnected,
and
if
you
cut
one,
the
whole
thing
falls
apart.
So
the
signature
scheme
is
named
after
these
like
interconnected
rings,
and
that's
because,
like
each
of
these
rings,
has
sort
of
some
of
the
data
of
the
other
ones
or
just
dependent
on
the
other
ones.
Computers,
don't
really
like
ternary
systems,
so
also
human
brains.