►
From YouTube: IETF114-ANRW-20220726-1900
Description
ANRW meeting session at IETF114
2022/07/26 1900
https://datatracker.ietf.org/meeting/114/proceedings/
A
A
All
right,
it
looks
like
the
rate
of
people
joining.
B
C
A
A
All
right,
in
that
case
welcome
everybody
to
the
second
of
the
applied
networking
research
workshop
sessions
for
2022.
A
A
So
I'd
like
to
begin
with
the
the
usual
reminders,
the
applied
networking
research
workshop
is
organized
by
acm,
sig
karma
by
the
irtf,
and
it's
co-located
with
the
ietf
meeting.
As
you
all
know,
and
as
a
result,
that
means
we.
We
follow
the
itf's
intellectual
disclosure,
intellectual
property
rights,
disclosure
rules.
A
So
by
participating
in
this
meeting,
you
follow
the
rules
which
I'm
sure
you're
also
familiar
with
you.
You
agree
to
follow
the
rules
which
are
all
familiar,
which
I
hope
you
are
all
familiar
with,
and
that's
a
if
you're
speaking
at
the
microphones
and
you
you
know
of
any
patents
or
an
intellectual
property
that
covers
covers
the
topic
you're
discussing.
Then
you
agree
to
to
disclose
that
the
the
links
on
the
slide
point
to
the
the
detailed
policies
there.
A
In
addition,
this
session
is
being
recorded
and
will
be
available
online
on
youtube
after
the
session,
and
if
you
speak
at
microphones
in
the
room,
if
you
join
during
the
session
remotely,
then
you
you
agree
to
being
recorded
and
you
agree
to
that
recording
being
made
publicly
available.
A
A
A
Any
personal
information
you
give
to
the
ietf
to
the
irtf
will
be
handled
in
accordance
with
the
privacy
policy
and
there's
a
link
to
that
on
the
slide.
That
mostly
mostly
applies
to
your
registration
details
and
by
participating
in
the
the
irtf.
A
And
a
final
reminder
before
we
start
because
of
the
the
risk
of
covet
as
a
safety
measure,
we
are
requiring
all
in-person
participants
in
this
meeting
and
in
the
other,
itf
controlled
areas
of
the
meeting
venue
to
where
n95
or
equivalent
masks
at
all
at
all
times
what,
while
they're
in
this
meeting
or
in
the
the
itf
controlled
areas.
A
The
only
exception
for
that
is
for
the
the
speakers
in
the
session
who
are
actively
giving
the
talks
at
the
front
microphone.
A
A
The
goal
of
the
a
rw
is
to
provide
a
forum
for
researchers,
vendors
network
operators
and
the
standards
community
to
to
talk
about
to
discuss
interesting,
new
results
in
applied
networking
research.
A
One
of
the
goals
is
to
try
and
find
inspiration
from
the
topics
and
and
the
problems
being
discussed
in
the
itf
and
the
other
is
to
try
and
connect
the
research
community
with
the
iatf
and
to
help
bring
new
ideas
into
the
itf
community
into
the
standards
community.
The
goal
is
to
encourage
this
two-way
exchange
of
ideas.
A
The
a
rw
is
very
much
there
to
encourage,
hopefully,
the
submission
of
new
new
research
results
that
could
form
the
basis
of
engineering
work
in
the
itf,
but
also
to
to
help
better
specify
internet
protocols.
A
Change
operational
practices,
possibly
influence
future
research
and
experimentation
in
the
ietf
and
the
irtf.
A
I
mean,
as
as
we
saw
in
in
the
talks
this
morning
as
we've
seen
it
in
the
previous.
What
versions
of
this
workshop
we've
had
a
lot
of
talks
about
how
to
to
specify
new
protocols
about
new
protocol
designs,
improvements
to
protocols,
and
so
on.
A
Something
we
we
haven't
discussed
so
much
in
the
a
and
rw
which
I
think
is
one
of
the
goals
of
this
session
is
to
try.
C
A
Start
to
think
about
how
we
describe
and
specify
protocols
how
we
can,
you
know,
think
about
how
we
can
specify
how
we
can
ensure
that
the
protocols
we're
specifying
are
consistent
and
correct
and
how
we
can
verify
the
specific
specifications.
We
are
writing
as
researchers
or
as
engineers
in
the
ietf
are
correct
and
how
we
can
validate
if
the
implementations
match
those
specifications.
A
I
think
one
of
the
key
goals
of
writing
a
specification
is
to
communicate
and
communicate
in
a
way
that
we
know
that
what
is
written
in
the
spec
is
correct
and
that
we
can
check
that
the
implementations
match
that
specification
check
that
you
know
the
specification
is
self-consistent.
The
implementations
match
the
specification
and
everything
should
interwork.
A
A
What
I'm
hoping
we
can
start
to
do
in
the
session
is
to
start
to
discuss
whether
this
is
the
right
way
of
writing.
Protocol
specifications
I'll
start
to
discuss
to
what
extent
formal
methods,
structured
specification,
languages,
natural
language,
processing
or
other
techniques
can
help
to
describe
network
protocols.
A
The
goal
is
very
much
not
to
think
about
how
we
change
think
about
how
rfcs
are
prepared
right.
It
was
what
what
I
I
hope
was
a
light-hearted
comment
in
the
the
chat
earlier
about.
Is
this
about
whether
we
use
markdown
or
xml
for
for
writing
rfcs
right.
The
goal
of
today
is
not
to
start
a
conversation
about
the
rfc
series.
A
A
I'd
like
to
use
this
session,
hopefully
to
start
a
conversation.
So
that's
a
conversation
about
how
we
specify
protocols
and
hopefully
to
begin
to
connect
academic
researchers
who
are
studying
protocol
specification
with
the
engineers
in
the
itf,
the
researchers
in
the
itf
and
the
irtf
that
specify
protocol
standards
and
design
and
develop
new
protocols.
A
So
our
agenda
for
today.
A
We
have
we
just
finished
the
the
introduction
status
update
and
then
we
we
have
three
invited
talks.
A
First,
talk
will
be
by
max
von
hippel
from
northeastern
university,
we'll
be
talking
about
automated
attack
synthesis
by
extracting
finite
state
machines
from
critical
specifications.
D
C
A
I'll
be
talking
about
some
work,
they've
been
doing
on
tools
for
disambiguating
rfcs
and
then
finally,
chris
wood
from
cloudflare
will
be
talking
about
some
of
the
work.
That's
been
happening
in
the
crypto
forum,
research
group
and
some
of
the
the
successes
and
limitations
of
the
way
that
group
has
been
specifying
cryptographic
standards
and
then.
A
We
have
a,
I
hope,
some
some
time
for
discussion
before
we
close.
A
There
is
a
note
that
there's
a
hedgedock
for
notes.
The
link
is
on
the
slide
here.
If
you
have
questions
or
comments,
either
put
them
in
the
chat
or
put
them
in
in
the
hedge
dock.
I
guess
probably
better
to
put
them
in
in
the
chat,
there's
also
a
session
in
the
hedge
docker,
in
the
notes.
A
If
you
would
like
us
to
get
in
touch
about
any
of
this
work,
if
you're
interested
in
following
up
on
any
of
these
ideas,
then
please
put
your
name
in
there
and
we'll
try
and
follow
up
afterwards
and
we'll
talk
more
about
this
in
the
discussion
session
at
the
end.
A
So
for
that,
if
max
is
present
in
the
room,
he
could
come
up
and
I'd
like
to
to
hand
over
to
him
for
the
first.
D
And
okay
and
colin,
can
you.
A
Yep,
I'm
just
trying
to
find
the
right
button
in
me
took.
D
This
is
old
news
to
those
of
you
who
who
build
protocols,
so
everybody
in
this
room,
but
but
the
internet
runs
on
protocols
such
as
tcp,
udp,
etc,
and
these
protocols
create
a
sort
of
shared
language
for
computers
to
talk
to
one
another.
Here
we
have
picture
the
internet
or
specifically
the
arpanet
in
december
1969.
D
D
I'm
a
formal
methods,
researcher
and
in
formal
methods.
We
call
these
things,
crippy
structures,
but
to
normal
software
engineers,
protocols
are
described
by
finite
state
machines
and
a
finite
state
machine
is
actually
a
completely
unambiguous
specification
of
the
language
of
a
protocol
and
essentially
when
I
say
that
a
protocol
makes
us
that
two
computers
speak
the
same
language,
I'm
saying
something
both
informal
and
formal.
The
formal
statement
is
that
they
run
either
the
same
state,
machine
or
state
machines
that
accept
the
same
language.
D
In
our
group
at
northeastern,
we
specialize
in
studying
these
finite
state
machines,
either
using
formal
methods
or
other
approaches
and
trying
to
find
flaws
in
them.
So
we
take
a
protocol,
the
protocol,
we
define
a
protocol
to
be
whatever
is
in
these
rfc.
We
then
formalize
it
using
a
finite
state
machine
that
we
write
in
some
language
like
pramela
or
upol,
or
something
like
this,
and
then
we
have
a
tool
that
generates
attacks
against
that.
That
protocol
expresses
an
fsm.
D
So
there's
kind
of
a
lot
of
hidden
steps
in
between
there
right
going
from
specification
to
fsm
to
attacks
and
I'd
like
to
disambiguate
that
a
little
bit
when
we
talk
about
protocols,
we
first
and
foremost
have
to
talk
about
rfc
specifications
right,
which
is
obviously
especially
salient
in
this
context.
These
are
written
by
the
ietf.
The
written
in
english
prose,
which
I
think
is,
is
the
topic
of
today's
conversation
and
although
they
they
may
seem
technical
to
somebody
who
doesn't
write
these
documents
for
a
living.
D
The
reality
is
that
they
really
could
be
quite
a
bit
more
technical
than
they
are
right.
So,
for
example,
they
could
be
written
in
or
agda
they
could
be
written
in
a
in
a
programming
language
if
we
wanted
them
to,
and
there
are
advantages
and
disadvantages
to
their
current
presentation.
D
Then
a
programmer
reads
these
rfc
specifications
and
implements
them.
So
on
the
left.
We
have
the
well
a
tiny
little
screenshot
of
the
dccp
rfc
and
on
the
right
we
have
a
tiny
little
screenshot
of
dc
dccp
implemented
in
the
linux
kernel,
and
there
are
all
sorts
of
decisions
that
the
implementer
has
to
make
when
implementing
the
the
rfc,
and
that
doesn't
just
include
decisions
about
how
to
implement
some
functionality
like.
Oh,
this
list
needs
to
be
sorted.
D
So
when
the
implementer
reads
the
rfc
specification
and
decides
what
she
is
going
to
implement
implicitly
she's,
creating
a
mental
model
in
her
head
and
to
me
as
a
formal
methods
person,
I
view
that
mental
model
as
a
finite
state
machine,
but
it
doesn't
really
matter.
You
could
view
it
as
any
isomorphic
thing.
The
point
is
that
obviously,
a
human
being
reading
english
text
is
going
to
to
imagine
what
that
text
is
supposed
to
communicate
and
that
thing
that
that
she
or
he
imagines
is
some
expression
abstractly
of
what
they
intend
to
implement.
D
They
then
go
and
implement
it,
and
hopefully
there's
no
errors
in
that
process
either,
so
bugs
in
a
protocol.
Implementation
can
arise
from
any
one
of
the
steps
in
this
process.
First
of
all,
there
could
be
fundamental
design
flaws
in
the
specification.
It
could
be
that
we
specified
something.
That
was
bad
right.
It
was
a
bad
idea
and
the
idea
needs
to
be
reworked.
D
It's
also
entirely
possible
that
the
mental
model,
the
finite
state
machine,
is
wrong,
in
other
words,
that
the
programmer
or
implementer
made
some
mistake.
When
she
read
the
specification
and
misinterpreted
something
now
we
could
blame
that
on
the
programmer
or
blame
it
on
the
author.
It
doesn't
really
matter
but
mathematically.
D
The
second
half
will
be
doing
something
useful
with
that.
So,
in
order
to
measure
that
the
extracted
final
state
machine
is
actually
good
for
something
I'm
going
to
do
something
with
it
and
by
something
I
mean
attack,
synthesis
I'm
going
to
automatically
generate
attacks
if
you're
in
this
room
and
you're
like
well,
that
sounds
cool,
but
I
don't
care
about
security.
I'll,
tell
you
right
now!
You
can
view
these
attacks
as
bugs,
so
you
could
also
view
it
as
I'm
going
to
automatically
find
bugs.
D
When
we
try
to
extract
a
finite
state
machine
from
a
request
for
comments
document,
there
are
a
number
of
critical
challenges
that
we
faced.
First
and
foremost,
there's
simply
no
one-to-one
mapping
between
the
text
and
what
we
call
this
canonical
finite
state
machine.
In
fact,
it
takes
human
expertise
to
read
a
rfc
and
interpret
what
it
is
that
it
says,
or
should
say,
for
a
number
of
reasons
that
we'll
get
into
second
and
related
to
the
first
point:
rfcs
contain
omissions,
mistakes
and
ambiguities.
D
Here's
an
example
from
the
dccp
rfc,
where
a
sentence
if
interpreted
mathematically,
would
imply
the
existence
of
a
transition
from
part
open
to
open
upon
receipt
of
a
dccp
closed
packet.
But
of
course
that
would
be
nonsensical.
If
you
receive
a
dccp
closed
packet,
you
should
close
your
connection
not
go
to
open,
and
so,
if
you
read
the
rest
of
the
document,
it
becomes
clear
that
that
transition
should
not
exist.
D
Another
problem
is
that
off-the-shelf
natural
language
processing
tools
are
simply
not
suitable.
Generally
speaking,
they
are
trained
on
newswire
text
or
books,
or
things
like
this
that
are
non-technical
in
nature,
and
so
they
face
the
exact
same
problem
that
I
faced
when
I
tried
to
describe
my
research
to
my
girlfriend
who's,
not
a
programmer,
which
is
that
I
use
all
sorts
of
lingo
that
are
kind
of
domain
specific
and
and
don't
mean
anything
to
people
in
other
fields
like
in
her
case
biology.
D
So
we
need
something:
that's
trained
on
relevant
text
such
as
computer,
networking
textbooks
or
stack
page
stack
exchange,
stack,
overflow
questions,
etc.
Finally,
there's
a
ton
of
variation
in
the
language
and
structure
of
different
rfcs-
and
this
is
one
of
the
I
think
interesting
results
we'll
show
today
is
that
spoiler
alert.
The
dccp
rfc
is
more
complicated
than
tcp
one,
at
least
in
terms
of
its
english
text.
D
So
we
have
a
multi-step
approach.
The
first
step
is
that
we
create
a
technical
language
embedding.
Essentially,
what
we're
doing
here
is
that,
rather
than
impose
our
own
concept
of
what
the
most
natural
vector
space
is
for
the
text
that
describes
an
rfc
to
live
in,
we
are
allowing
a
machine
learning
model
to
come
up
with
an
ideal
vector
space
that
it
wants
to
learn
from
that.
Embedding
is
going
to
capture
all
sorts
of
interesting
things
about
the
text,
such
as
the
the
punctuation
of
sentences
or
the
use
of
mathematical
symbols,
etc.
D
Second,
we
have
a
zero
shot
protocol
information
extraction.
That's
where
we're
going
to
take
the
text
that
is
now
projected
into
the
vector
space
of
our
technical
language
embedding
and
we're
going
to
automatically
extract
an
intermediary
representation
which
I'll
get
into
more
later,
which
contains
sufficiently
much
information
for
us
to
then
in
a
subsequent
step,
extract
a
finite
state
machine.
So
I
want
to
stress
this
point
that
we
don't
do
end
to
end
in
one
step
from
english
to
fsm.
D
Finally,
we
do
something
useful
with
that
fsm
in
our
cases,
attack
synthesis
I'd
like
to
stress
that
you
can
do
other
things
if
you
care
about
other
things.
So
if
you
were
like
well,
I
don't
care
about
attacks,
I'm
not
a
security
person,
but
I
care
about
bugs.
I
care
about
correctness,
monitors
whatever
well
boy.
Do
I
have
other
tools
for
you.
There
are
other
things
you
can
do
with
these,
but
what
we
did
was
attack
synthesis
because,
frankly,
my
dissertation
is
going
to
be
about
attack
synthesis.
So
I
applied
attack
synthesis.
D
Let's
talk
about
step
one
in
step,
one.
We
want
to
learn
a
technical
language.
Embedding,
as
I
said,
I
don't
want
to
impose
my
own
personal
moral,
ethical
beliefs
about
the
natural
vector
space
of
rfc
documents.
Rather
I'm
going
to
let
the
computer
decide
this,
we
use
something
called
burt.
The
idea
of
bert,
loosely
speaking,
is
that
it
takes
a
contextualized
approach
to
understanding
what
words
mean,
so
the
word
reset
could
mean
something
different
in
different
contexts,
because
it
is
informed
by
the
words
around
it.
D
We
train
on
roughly
9000
documents
and
roughly
475
million
words,
so
our
bert
model
is
trained
on
quite
a
lot
of
data
that
includes
stack
exchange,
stack
overflow,
networking
textbooks
all
sorts
of
other
things.
D
Next,
we
want
to
do
protocol
information
extraction.
So
just
to
remind
you
in
case
you
missed
it.
I
now
have
some
text
that
is
projected
into
my
vector
space
and
I
want
to
translate
it
into
an
xml
document.
I'll
show
you
what
that
looks
like
momentarily,
which
somehow
captures
enough
data
for
me
to
then
be
able
to
extract
a
finite
state
machine
in
a
straightforward
manner.
So
I
I
essentially
want
to
read
the
text
so
put
differently.
D
D
D
Reference
tags
that
reference
previously
defined
states,
transition,
events,
etc,
state
machine
tags
that
tag
essentially
the
structure
of
the
finite
state
machine
in
the
document,
any
references
to
transitions
or
other
logic
that
relate
to
the
state
machine
and
then
finally,
control
flow
tags
that
essentially
capture
logical
structure
in
the
fsm.
So
when
you
read
these
documents
a
lot
of
times,
they
have
kind
of
complicated.
If
else
structure,
often
that's
related
to
the
indentation
of
different
blocks.
Of
text
and
our
nlp
is
going
to
capture
that
using
these
control
flow
tags.
D
We
have
a
transition
block
that
contains
an
entire
state
machine
transition,
including
the
source
state
and
the
target
state,
and
then
we
have
an
event
and
we
know
what
exactly
occurs
on
that
event.
So,
from
the
perspective
of
a
formal
methods
person,
I
now
have
the
source
state,
the
target
state
and
the
label
on
the
transition
in
my
cryptic
structure.
But
from
the
perspective
of
a
software
engineer,
I
have
you
know
an
edge
in
the
directed
graph.
That
is
my
final
state
machine.
D
In
order
to
go
from
text
to
this
intermediary
representation
without
requiring
a
bunch
of
grad
students
to
spend
countless
hours,
annotating
things,
we
use
a
machine
learning
approach.
In
fact,
we
use
two
different
machine
learning
approaches.
We
compare
them.
The
first
one
is
a
linear
conditional,
random
fields,
model
which
starts
out
by
splitting
text
into
chunks.
D
It
then
extracts
features
from
the
the
embedding
that
that
the
chunks
are
in
and
then
feeds
those
into
a
linear
conditional,
random
field,
linear
condition,
conditional
random
fields,
model
which
builds
on
the
markovian
assumption
that
the
next
state
is
predicated
only
on
the
previous
one.
D
So
we
evaluate
these
against
each
other
and
also,
in
contrast
to
a
completely
rule-based
heuristic
program,
which
we
don't
expect
to
do
very
well,
because
this
doesn't
learn
anything.
The
rule-based
system
in
fact
performs
poorly,
as
we
expect
achieving
below
35
token
level,
accuracy
and
and
similarly
bad
span
level
accuracy.
In
contrast,
the
linear,
crf
and
neural
crf
models
perform
quite
a
bit
better
with
the
neural
one
slightly
outperforming
the
linear
one
and
you'll
actually
see
in
almost
all
the
metrics.
D
So
now
that
we've
gone
from
text
to
embedding
to
intermediary
representation,
we
now
have
these
rfc
documents
that
are
expressed
in
xml.
Like
this,
you
can
imagine
big
long
document
entirely
structured
with
xml
tags.
That
say
what's
happening
where
and
now
I
need
to
extract
to
find
a
state
machine
from
that,
and
I
want
to
be
very
clear.
This
itself
is
not
a
finite
state
machine.
A
finite
state
machine
would
consist
of
things
like
this
actual
transitions
right.
I
need
to
interpret
the
xml
and
extract
the
fsm.
D
D
But
if
you
want
to
improve
on
our
work,
a
good
place
to
start
would
be
on
the
fsm
extraction
algorithm
because
at
the
moment
we're
not
using
machine
learning
for
that
step,
and
I
think
a
fruitful
thing
to
do
would
be
to
try
that
once
we've
extracted
the
fsm,
we
want
to
know
how
good
is
it
right?
I
mean
it's,
it's
probably
not
exactly
what
was
written
in
the
text,
so
so,
how
close
is
it?
D
So
we
compare
against
one
that
I
wrote
by
hand
and
to
get
back
to
the
point
about
why
all
this
is
necessary.
I
will
tell
you:
it
took
quite
a
long
time
to
write
that
by
hand,
because
there
are
all
sorts
of
subtleties
when
you
get
into
a
model,
checking
environment
of
the
ordering
of
events
and
the
semantics
of
the
channels
that
connect
to
two
processes
that
are
not
captured
by
just
the
diagram
that
you
find
in
your
average
rfc
document.
So
we
expect
to
have
20
transitions
in
our
tcp
fsm.
D
We
also
hand
annotate
or
hand
create
an
xml
for
the
tcp
rfc,
and
we
call
that
our
gold
xml
and
then
our
fsm
from
that
is
the
gold
fsm,
so
that
one
should
have
perfect
annotations
and
any
bugs
are
related
to
either
ambiguities
and
text
or
problems
with
our
extraction
algorithm.
So,
comparing
these
we
find
that
the
gold
fsm
does
quite
well.
D
We
define
a
transition
as
being
approximately
correct
if
his
source
state
target
state
are
both
correct
and
at
least
one
of
the
events
on
the
label
is
correct,
but
it
might
have
like
the
wrong
order
of
events
like
say
you
maybe
receive
scene
and
then
send
back
instead
of
the
opposite
or
something
and
and
in
the
case
of
gold
we
find
slightly
fewer
transitions
than
we
expect
to.
But
almost
all
the
ones
we
get
are
are
approximately
correct
with
the
linear
neural
models,
there's
quite
a
bit
of
noise.
D
So,
although
we
find
most
of
the
transitions
we
expect
to
find,
we
also
get
a
bunch
of
incorrect
transitions.
D
So,
comparing
to
our
skyline,
we
find
that
another
interesting
result
here
is
that
in
the
very
particular
case
of
tcp,
the
neural
and
linear
models
extract
different
intermediary
representations,
but
actually
all
that
is
modulated
away
by
the
finite
state
machine
extraction
algorithm.
So
any
differences
in
intermediary
representations
do
not
matter
in
this
particular
case
in
terms
of
the
extracted
fsm,
although
this
doesn't
generalize.
D
Looking
at
dccp,
we
find
deceptively
more
transitions
than
we
did
with
tcp,
but
we
expect
substantially
more
so
fractionally
less
are
found.
We
expect
to
have
34
total
transitions
with
the
gold.
We
only
get
17..
As
you
can
see,
the
linear
and
neural
models
do
not
perform
as
well
as
gold,
but
again
neural
slightly
outperforms,
linear,
okay.
So
let's
talk
about
kind
of
why
we
miss
things
because,
as
you
just
saw
in
these
graphs,
we
missed
a
substantial
number
of
transitions.
D
D
This
would
have
to
search
around
for
a
while
to
try
to
figure
it
out.
Also,
if
we
wrote
a
rule
to
tell
the
computer
to
look
that
far
back,
it
would
cause
problems
when
we
parse
other
parts
of
the
document.
So
this
is
a
good
example
of
why
you
can
have
missing
transitions
when
you
extract
from
an
rfc.
D
So
we
do
an
evaluation
and
try
to
figure
out
why
it
is
that
we
miss
transitions
and
we
we
do
this
by
actually
manually
inspecting
every
single
missing
transition.
We
find
that
in
the
case
of
tcp,
most
of
the
missing
transitions
are
due
to
omissions
or
ambiguities
in
the
text.
The
thing
I
just
showed
you,
I
would
consider
to
be
an
omission
or
ambiguity.
I
think
it's
ambiguous
because
you
have
to
search
in
other
parts
of
the
document
that
are
pretty
far
away
paragraphs
away
to
figure
out
where
the
transition's
supposed
to
go.
D
However,
there's
also
a
number
of
extraction
errors
for
tcp.
In
contrast
with
dccp,
we
also
get
some
prediction
errors,
which
is
interesting,
but
that
might
just
be
a
scaling
aspect,
because
there
are
so
many
more
transitions
in
dccp.
So
who
knows,
if
you
guys
added
like
20
more
transitions
to
tcp,
maybe
I
would
get
some
prediction
errors
with
tcp,
I'm
not
really
sure
it
should
be
noted.
D
We
missed
very
few
for
tcp,
and
additionally,
this
result
with
gold
implies
that
the
dccp
specification
is
substantially
more
ambiguous
than
the
tcp
one,
because
the
fraction
of
of
missing
transitions
that
are
due
to
omissions
or
ambiguity
is
substantially
larger
than
the
fraction.
That's
due
to
extraction
errors,
in
contrast
to
the
same
analysis
with
the
gold
tcp
fsm
right,
so
in
both
these
cases
we're
using
our
skyline
intermediary
representation.
So
any
problems
are
at
the
extraction
level
or
in
the
spec
and
again
the
the
neural
model
is
is
slightly
better.
D
Next,
we
look
at
incorrect
transitions,
so
I've
told
you
about
missing
transitions,
and
I've
said
that,
essentially,
they
can
happen
for
multiple
reasons,
but
they
mostly
happen
because
of
omissions
or
ambiguities
in
the
text.
Another
question
would
be
what
about
when
we
make
a
mistake
when
we
extract
a
transition,
that's
just
completely
off.
It
has
the
wrong
start
state
or
the
wrong
target
state
both
or
none
of
the
correct
events
on
the
transition
in
the
case
of
tcp
they're,
primarily
caused
by
prediction
errors.
So
this
would
be
something
we
can
improve
on
right.
D
The
prediction
errors
are
are
part
of
our
and
their
their
problems
with
our
methodology,
so
our
methodology
can
be
improved
to
not
have
so
many
prediction
errors,
potentially
even
just
by
training
on
more
rfcs.
To
be
honest
in
dccp,
however,
looking
at
the
gold
we
find
that
a
substantial
number
of
incorrectly
extracted
transitions
are
due
to
omissions
or
ambiguities
in
the
text,
and
so
this
again
highlights
that
the
dccp
rfc
experimentally
appears
to
be
more
ambiguous
than
the
tcp1
along
numerous
metrics
that
we
can
measure.
D
How
many
transitions
we
can
extract
and
and
conversely,
how
many
errors
we
have
and
again
this
suggests
that
dccp
is
more
complex.
D
Okay,
so
now
I've
told
you
end
to
end
that
we
have
a
way
to
automatically
read
an
rfc
document
and
extract
a
finite
state
machine,
and
I've
shown
you
that
the
final
state
machine
has
a
number
of
errors,
including
missing
transitions
and
also
transitions.
That
are
wrong,
and
I'm
going
to
claim
perhaps
scandalously
that,
despite
these
errors,
it
is
good
enough
to
do
something
useful
to
me
that
useful
thing
is
going
to
be
attack
synthesis.
So
that
means
the
automatic
generation
of
attacks
against
the
protocol.
D
But
you
could
also
view
this
as
the
automatic
discovery
of
potential
bugs.
D
So
I
used
an
attack,
synthesis
tool
called
korg,
which
I
created,
which
is
why
I
used
it.
I
presented
korg
at
safe
comp
2020,
which
was
in
lisbon,
portugal,
but
I
was
over
the
internet
because
of
covet.
Unfortunately,
so,
hopefully,
next
time
we'll
get
to
go
to
lisbon.
Cork
takes
his
input
and
extracted
finite
state
machine
like
the
one
we
just
generated
a
linear
temporal
logic
property.
This
is
just
a
formalization
of
what
it
means
for
the
state
machine
to
be
correct.
D
It
then
creates
a
system
in
which
two
pures
are
communicating
over
a
vulnerable
channel,
so
it
injects
a
vulnerability
into
the
channel
between
them,
a
man
in
the
middle,
and
it
asks
if,
in
that
threat
model,
the
property
could
be
violated,
so
we're
not
asking
if
the
protocol's
correct
on
its
own,
we
kind
of
assume
it
is.
In
fact,
we
don't
even
look
at
properties
unless
they're
satisfied
by
the
prop
the
model
in
isolation,
but
we
want
to
know
under
attack.
Is
this
still
satisfied?
D
The
spin
model
checker
reduces
this
to
a
language
emptiness
problem
by
intersection
of
a
book
automaton
with
the
system,
but
basically
it
does
fancy
formal
methods
from
the
80s,
and
it
says
no.
D
The
system
is
not
correct
when
it's
under
attack,
here's
a
counterexample
which
is
just
a
sequence
of
events
that
the
system
goes
through
in
which
the
property's
violated.
Finally
cork
takes
that
counter-example
and
turns
it
into
an
attacker
program
that
simply
replays
that
attack.
So
it's
a
pretty
simple
approach.
D
We
looked
at
two
case:
studies,
tcp
and
dccp.
The
fsms
shown
here
are
made
by
me
and
have
been
extensively
model
checked
for
the
number
of
properties
and
shown
to
satisfy
all
of
them.
So
these
can
be
viewed
as
perhaps
more
canonical
than
the
fsms
that
you'll
find
in
the
rfc
documents
or
in
any
number
of
security
papers
where
they
do
maybe
graph
guided
search,
but
they
don't
really
care
that
they
satisfy
formal
mathematical
properties
for
tcp.
We
use
four
properties.
D
D
Likewise,
we
also
have
four
properties
for
dccp.
I
don't
think
these
are
incredibly
interesting
properties,
so
I'll
kind
of
skip
them,
but
I'm
happy
to
talk
about
them.
Afterward.
If
people
have
questions,
we
generate
a
number
of
attacks.
This
is
kind
of
a
confusing
table,
so
I'm
going
to
try
to
break
it
down
into
sub
points,
essentially
with
tcp
there's
only
one
property
with
which
we
are
successfully
able
to
generate
attacks
using
any
of
our
extracted
finite
state
machines.
D
However,
we
get
some
false
positives,
so,
for
example,
with
the
third
property,
I
wrote
for
dccp
using
the
fsm
that
was
extracted
with
our
linear
conditional
random
fields
model
we
generate
13
attacks
and
all
13
are
false,
positives.
So
again,
there's
sort
of
a
noise
to
signal
ratio
thing
here,
where
I'm
generating
more
attacks
with
dccp,
but
there's
more
errors.
D
I
generate
very
few
with
tcp,
but
there's
zero
errors
again
to
me
this
seems
to
indicate
more
ambiguity
or
more
complexity
in
the
dccp
rfc,
because
errors
in
the
attack
synthesis
step
are
caused
by
errors
in
the
finite
state
machine.
So
this
is
fairly
predictable
for
the
results.
I
showed
you
up
until
this
point.
D
Okay,
when
I
say
the
word
attack,
it
can
mean
any
number
of
things
and
people
use
this
word
rather
liberally.
So
I
want
to
give
some
examples
what
I
mean
by
an
attack.
So
you
know
what
exactly
I'm
saying
one
attack
would
be
to
inject
an
act
to
the
first
peer
and
a
tcp
active
passive
establishment
routine.
In
order
to,
if
I
remember
correctly,
get
that
pure
stuck
and
seen
received
because
it
can
non-deterministically
decide
to
exit,
but
if
it
doesn't
decide
to
exit.
D
For
example,
if
it's
configured
to
have
a
very
long
timeout
time,
then
it
could
stay
there
forever
or
essentially
forever.
They
may
sound
a
little
silly.
But
if
you
talk
to
samuel
giro
at
lincoln
labs
he'll
tell
you
that
he's
found
tcp
implementations
in
the
wild
to
have
this
bug
and
in
fact
I
found
stack
exchange
queries.
Even
experts
exchange
queries
from
people
who
had
this
problem,
where
their
tcp
connections
hung
forever
because
they'd
configured
the
timeout
incorrectly.
D
Another
example
which
is
much
more
complicated,
which
we
also
find,
would
be
to
elaborately
spoof
each
peer
and
a
dccp
connection
in
order
to
guide
the
other
one
into
close
rack.
This
is
sort
of
a
silly
attack
because
I'm
not
really
sure
what
like
an
elite.
Russian
hacker
would
want
with
this.
However,
it
is
a
violation
of
a
correctness
property
because
there's
no
active
active
close
in
dccp.
D
So
if
I
can
induce
an
active
active
close,
maybe
I
can
cause
a
deadlock
or
something-
and
I
want
to
point
out
this
interesting
point-
that
because
our
approach
is
based
on
model
checking
we'll
find
many
many
variants
on
kind
of
the
same
attack,
and
so
indeed
we
find
many
variants
on
this
attack,
and
I
can
show
you
dozens
of
different
ways
to
essentially
do
the
same
thing.
D
I
want
to
conclude
with
future
directions,
so
the
research
I
showed
you
today
was
presented
at
s
p
to
a
security
audience,
and
you
know
there's
kind
of
a
certain
set
of
things
that
interest
them
in
terms
of
breaking
protocols,
but
I
think
for
you,
you
may
have
different
interests
in
terms
of
maybe
building
protocols
and
won't
be
broken
from
the
first
first
place.
D
So
it'd
be
nice
if
we
could
automatically
highlight
omissions
and
ambiguities
in
the
rfc
text,
and
I
think
that
we're
part
of
the
way
there
if
we
can
kind
of,
take
our
tool
and
make
it
say,
I'm
kind
of
confused
about
how
to
interpret
this
piece
of
text.
Could
you
clarify
it
that
might
be
a
useful
functionality
for
rfc
authors
to
be
able
to
know
when
something
they
wrote
is
not
entirely
clear?
D
Another
thing
I'd
like
to
do
is
automatically
suggest
bug
fixes.
So
if
I
extract
the
finance
state
machine
and
the
rfc
author
looks
at
it
and
says
yeah
actually
that
looks
correct.
I
think
that's
what
I
intended
to
describe
and
then
I
find
an
attack
against
it.
Well,
it'd
be
nice.
If
I
could
use
some
synthesis
techniques
from
the
formal
methods
community
to
automatically
suggest
a
patch
that
would
resolve
that
problem
and
we
actually
have
an
undergrad
working
on
that
problem
with
us
right
now,
which
is
very
exciting.
D
A
completely
different
kind
of
vein
of
research
that
we'd
like
to
undergo,
is
to
extract
logical
properties
right
now.
I'm
writing
those
properties
from
hand.
That's
not
easy.
Writing
properties
from
hand
is
maybe
not
quite
as
hard
as
writing
models
by
hand,
but
they're
both
much
more
difficult
than
they
seem,
and
it's
very
easy
to
write
a
property
that,
for
example,
is
vacuously
true,
and
then
you
really
can't
do
anything
with
it.
D
So
there's
a
lot
of
logical
subtleties
in
these
things,
we'd
like
to
offer
support
for
secure
protocols,
there's
a
lot
of
kind
of
interesting
problems
to
this
like
capturing
the
arithmetic
that
goes
into
the
cryptography
and
such
a
protocol,
capturing
secrecy
requirements,
and
we
might
have
to
take
a
different
approach
in
terms
of
the
back
end,
for
example,
maybe
relying
on
pro
verif
or
tamarin
instead
of
quark.
D
I
think
that,
finally,
it
would
be
really
nice
if
we
could
take
all
of
these
different
ideas
and
turn
them
into
some
sort
of
cohesive
software
suite
for
an
rfc
author
in
the
loop,
so
an
rfc
author
or
authors
are
writing
an
rfc.
It
would
be
nice
if
they
could
run
our
code
and
be
told
hey.
This
seems
ambiguous.
D
Maybe
you
could
clarify
this
and,
and
they
could
kind
of
have
a
feedback
loop
to
do
so
until
they
automatically
get
an
unambiguous
piece
of
text,
and
you
could
even
imagine
perhaps
mixing
formal
language
with
the
english
to
make
it
so
that
if
the
rfc
author
actually
likes
the
way
they
describe
something
in
english,
they
could
just
give
little
formal
hints
about
how
to
change
it,
and
that's
it
with
that
I'll.
Take
any
questions.
Thank
you
very
much.
A
All
right,
thank
you
very
much
excellent
talk.
I
see
a
couple
of
people
in
the
queue
who's.
First,
ishii
nishida.
E
Web
service,
so
I
think
this
is
very
interesting
research.
Thank
you
so
much
and
then
I
this
might
be
a
part
of
a
huge
direction,
but
one
of
the
complicated
things
for
tcp
is
congestion,
control
and
loss,
recovery,
yeah,
and
then
I'm
wondering
if
we,
you
know
you
can
apply
this
technology
to
the
congestion
control
gross
recovery
and
how
it's
promising.
That's
what
I'm
curious
about.
D
Yeah,
that's
a
fantastic
question.
My
current
project
is
congestion
control.
However,
I
have
not
been
applying
nlp
to
the
problem.
I've
been
manually
modeling,
a
congestion
control
algorithm
of
interest.
I
think
that
congestion
control
is
pretty
complicated
and
it
would
be
fantastic
if
we
could
partially
automate
the
modeling,
but
I'm
trying
to
get
an
intuition
for
the
modeling
to
begin
with,
by
doing
it
by
hand
before
we
consider
automating
it,
but
I
mean
you
basically
predicted
the
next
year
in
my
life,
so
yeah,
that's
that's!
That's
what
I'm
doing
thanks.
A
All
right,
thank
you,
michael.
B
You
said
you
use
you
analyze
the
text
to
get
the
finite
state
machines.
Most
of
the
transport
rfcs
contain
a
graphical
representation
of
the
finite
state
machine.
So
did
you
also
consider
using
that,
or
is
that
too
hard
or
too
weak
or.
D
That's
a
great
question,
so
I
did
write
a
little
toy
script
that
could
read
the
the
ascii
diagrams
and
rfcs
and
got
it
to
work
on
quite
a
few
of
them,
but
there
was
nothing
very
scientific
about
it.
I
just
kind
of
wrote
a
little
heuristic
python
script.
I've
spoken
to
maria
about
this,
and
apparently
maria
did
the
natural
language
processing
for
the
most
part
on
this
project.
D
I
did
all
the
fm
maria
says
that
there
is
kind
of
a
field
of
nlp
that
deals
specifically
with
diagrams
and
figures,
and
it
could
be
applied
to
this
type
of
problem,
which
is
an
interesting
approach.
I
will
tell
you,
however,
that
the
fsm
diagrams
are
missing
a
bunch
of
information,
for
example.
If
I
recall
correctly,
I
think
that
the
tcp
fsm
diagram
is
missing
a
core
component
of
the
active
active,
close
routine
and
in
the
dccp
one.
D
I
think
that
there
was
again:
oh
you
know
what
I
think
in
dccp
there's
a
transition,
that's
shown
in
the
diagram,
but
not
in
the
text.
There's
a
bunch
of
places
where
the
diagrams
are
missing.
Things
there's
also
this
problem.
When
protocols
have
multiple
diagrams
that
have
to
be
somehow
merged
like
if
you
look
at
ltp,
which
is
a
pretty
interesting
protocol,
that's
used
in
space
exploration.
It
has,
I
think,
like
seven
or
eight
fsm
diagrams
in
it
that
all
interact
in,
I
would
say,
mathematically
ambiguous
ways.
D
Even
dccp
has
multiple
fsm
diagrams,
because
it
has
one
for
the
state
changing,
stable
and
unstable
and
then
another
one
for
the
communication.
So
diagrams
are
interesting.
I'm
not
personally
working
on
that
problem.
I
do
think
it
merits
attention,
but
I
don't
think
is
a
golden
ticket
to
automatically
getting
good
fsm
from
rfc
documents.
No.
D
Yeah,
that's
a
good
point.
We
we
mentioned
these
things
in
our
paper,
but
I'd
also
be
happy
to
summarize
some
of
the
differences
in
an
email
to
whoever
would
find
that
interesting,
so
maybe
offline.
You
can
help
me
know
who
would
like
to
know
about
things
I
found
in
the
rrc
that
that
maybe
could
be
improved.
G
Great
work
very
interesting:
I
want
just
two
questions:
quick,
the
first
one
you've
shown
two
different
methods,
one
with
linear
one
with.
I
don't
remember
sorry
anyway.
I
wanted
to
ask
if
they
are
wrong
in
the
same
way
or
they
actually
could
be
used
together
to
cross
validate
each
other
and
say,
okay,.
D
Yeah,
that's
a
good
question.
I
haven't
thought
about
that
maria
probably
has
yeah,
I'm
sure
that
you
could
use
them
to.
You
could
do
some
sort
of
intersectional
approach,
potentially
right
where
you
you
have
like
a
higher
layer
that
takes
as
input
both
models
and
then
finds
kind
of
where
they
agree.
That
sounds
interesting.
I
I
haven't
tried
that,
but
I
think
it's
a
good
idea.
G
Thank
you
and
the
second
one
is,
if
you
are
taking
else
into
account,
not
now
my
maybe
in
the
future,
the
problem
of
the
length
of
the
fields,
because
sometimes
the
same
transition
is
actually
governed
governed
by
some
fields.
It
might
be
either
too
short
or
too
long
or
well,
usually
too
short,
and
also
that
might
be
a
problem
for
the
state
machine.
Yeah.
D
That's
a
great
question:
I
would
say
that
prior
work
before
I
started
my
phd
by
my
group,
specifically
by
samuel
giro,
who,
as
I
said,
is
at
lincoln
labs
now
working
with
maria
applied
similar
techniques,
to
figure
out
the
packet
structures
of
protocols
using
natural
language
processing
applied
to
rfcs,
and
you
could
potentially
combine
these
approaches
right.
D
So
if
you
have
one
approach
to
find
the
packet
structure
and
another
to
find
the
finite
state
machine,
and
then
you
can
go
from
just
reasoning
about
kind
of
your
atomic
propositions
of
like
scene
and
fiend
to
actual
full-fledged
packets,
then
that
sets
you
up
to
pretty
nicely
be
able
to
make
arithmetic
statements
about
the
contents
of
those
packets
right,
such
as
their
length.
So
that
would
be
a
next
step.
However,
from
a
formal
method,
perspective
is
complicated
because
you
get
into
a
state
space
explosion
problem,
so
you
probably
wouldn't
want
to
use
model.
D
G
H
H
So
have
you
tried
to
actually
do
the
work
that
you
suggest
in
the
future,
steps
for
the
tl,
for
instance,
for
tcp
or
for
gccp,
which
would
be
essentially
measuring
how
many
trends
you
need
to
actually
make
in
the
original
rfc?
Until
you
get
to
the
point
where
you
actually
don't
have
any
missing
transition
anymore?
And
you
don't
have
any
false
positions
anymore,
because
it
feels
like
with
some
that
many
errors,
you
would
need
to
essentially
completely
rewrite
everything.
H
D
Great
question:
I
have
two
thoughts
on
that.
First
of
all,
we
have
not
worked
on
this
yet
because
I
think
there's
a
lot
to
improve
in
in
our
methodology
before
we
try
to
improve
the
rfcs,
so
I
mean
just
from
an
academic
perspective.
I
understand
that,
as
like
the
ietf
community,
you
are
more
interested
in
improving
the
rfcs,
but
as
academics
working
on
trying
to
find
cool
ways
to
extract
information
from
documents.
D
We're
perhaps
focused
right
now
on
improving
our
ways
of
extracting
information
from
documents
and,
as
you
can
see,
with
prediction
error,
there's
a
lot
that
we
could
improve
on
right
by
training
on
more
documents
and
improving
our
model
structure.
That
being
said,
I
think
actually
that
what
we'll
eventually
find
when
we
do
get
to
doing
this
experiment
is
that
the
amount
of
changes
necessary
is
not
that
great.
I
think.
D
For
the
most
part,
it
would
be
things
like
you
have
a
sentence
in
which
the
target
state
is
implicit,
and
you
just
add
the
words
to
scene
received
to
the
end
of
the
sentence,
and
now
it's
explicit
right
essentially
taking
things
that
to
a
human,
are
implied
and
making
them
explicit,
because
I'd
rather
have
a
slightly
redundant,
boring
spec.
That's
unambiguous,
then
one
that's
very
cleverly
written
and
I
get
confused
by
so
I
actually
do
not
think
it
would
take
that
many
changes
in
order
to
make
it
unambiguous
good
thanks.
Thank
you.
A
All
right,
thank
you,
a
reminder
that
we
we're
using
the
meat
echo
queue,
so
we
can
do
the
the
local
and
remote
people
in
the
queue.
Last
question,
I
think,
is
by
jonathan.
I
Yes,
yeah,
so
I
I'm
I,
I
spend
a
lot
of
my
time
doing:
tamarind
modeling,
so
I'm
interested
to
know
why
you're
using
fsm,
because
almost
all
protocols
that
I've
come
across
cannot
be
expressed
as
an
fsm.
You
need
tokens
or
they
have
states
or
you're,
actually
very
limited
to
maybe
just
tcp.
I
I
can't
imagine
what
other
ones
don't
have
state.
D
Yeah,
this
is
a
fantastic
question,
so
people
who
work
in
tamarin
or
pro
verif
or-
dare
I
say
kryptol-
are
interacting
with
mostly
cryptographic
protocols
or
protocols
that
use
cryptographic
primitives
and
they
care
about
things
like
secrecy
and
privacy
and-
and
I
care
about
those
things
too,
but
but
because
they
care
about
those
things
they
they
need
to
reason
about
threat
models
where
secrecy
or
privacy
or
confidentiality
et
cetera,
could
be
violated
and
attacks
against
those
things
and
fsms
are
are,
I
would
say,
probably
not
the
right
approach
for
that.
D
So
I
think
we
agree
on
that
front
and
I
I
think
that
maybe
a
more
type
theoretic
approach
like
what
what
cameron
does
is
quite
good.
There
actually
are
quite
a
few
protocols
that
have
at
least
components
that
can
be
studied
using
fsms,
mostly
communication
protocols.
D
Examples
include
ltp,
tcp,
dccp,
sctp,
sftp
ftp
and
some
also
some
newer
protocols.
I've
been
looking
at
that
are
not
related
to
itf,
which
we'll
release
in
future
work.
So
I
would
say
that
many
things
that
are
communication
protocols
that
have
handshakes
in
which
you
establish
some
sort
of
connection
are
are
very
well
described
by
fsms.
D
But
another
aspect
to
your
question
is
that
we
often
will
take
a
protocol
and
break
it
into
chunks
and
say
this
chunk
can
be
studied
by
model
checking
with
an
fsm
representing
that
component
of
the
protocol,
but
this
other
chunk
might
be
best
studied
in
some
other
way,
so
the
cryptographic
aspects
are
probably
not
suitable
to
to
model
checking
yeah.
Thanks
for
the
question,
though,
great
question.
A
All
right,
thank
you,
really
excellent
talk,
really
excellent
questions.
There's
a
bunch
of
discussion
in
the
chat
as
well,
which
you
might
want
to
look
at
so
in
a
couple
of
minutes.
So
thank
you
again
max
your
talk.
A
All
right,
thank
you,
so
our
next
speaker
today
is
jane
yen
from
the
university
of
southern
california,
who
is
going
to
talk
about
tools
for
disambiguating
rfcs.
J
Yes,
thank
you.
Okay.
Okay,
can
everyone
hear
me?
Okay,
great,
so
hi
everyone?
So
today
I
want
to
talk
about
our
work
that
I
work
with
my
advisors,
barack
robin
and
ramesh
govindam.
The
topic
is
tools
for
disambiguating
rfc.
J
J
J
It
also
indicates
that
great
human
efforts
in
involved,
so
human
efforts
is
involved
in
various
parts
of
the
specification
production
process
and
as
an
example
here
is
a
working
group,
they
will
need
to
gather
and
discuss
which
and
or
what
protocol
needs
to
be
standardized,
and
there
will
be
one
or
more
specification.
Authors
composing
the
contents
of
spec,
although
it
might
be
obvious,
the
content
is
full
of
domain
specific
knowledge
and
would
require
professionals
to
verify
the
content.
J
When
we
speak
of
the
concerns
about
the
quality
of
a
spec,
it
is
hard
to
neglect
the
impact
of
ambiguities
with
the
existence
of
the
ambiguities
in
a
spec
different
protocol.
Implementers
could
interpret
the
content
differently
and
generate
multiple
versions
of
the
protocols
and
here's
one
example
which
might
cause
different
interpretations.
J
At
first
glance,
it
might
not
be
obvious
why
the
sentence
is
ambiguous.
However,
when
we
focus
on
the
last
part
of
this
sentence,
it
explicitly
explicitly
mentions
the
checksum
starts
with
icmp
type,
but
it
doesn't
mention
where
the
checksum
computation
should
end.
So
how
could
this
affect
a
person's
interpretation?
J
Here?
I
show
the
icmp
header
to
help
illustrate
how
we
can
at
least
come
up
with
two
different
interpretations.
One
is
to
check
some.
Only
the
icmp
header
part
and
the
other
is
to
check
some,
both
the
header
and
the
payload,
as
shown,
the
two
different
interpretations
would
lead
to
different
protocol
implementations.
J
J
Motivated
by
this
question,
we
presented
our
work
in
last
year
cycle
and
and
our
w
following,
I
will
introduce
what
they
are
about,
respectively.
J
The
first
one
is
sage
in
this
work.
We
get
to
know
what
ambiguity
exists
in
a
number
of
long-standing
protocols.
Specifically,
we
uncover
by
instances
of
ambiguity
and
sex
instances
of
underspecification
in
icmprbc,
with
an
ambiguous
specification.
We
are
able
to
generate
executable
code
that
can
interoperate
with
the
third
party
code
and
they
generalize
to
significant
sections
of
bfd,
igmp
and
ntp.
J
So
in
sage
we
apply
natural
language
processing
techniques
on
it
to
discover
existence
of
ambiguities.
To
do
so,
we
need
to
understand
the
semantics
of
the
specification
which
is
commonly
termed
as
the
semantic
parsing
and
ultimately
we
want
to
leverage
the
results
from
semantic
parsing
to
generate
a
low
low-level
executable
protocol
code.
J
J
J
A
complete
rfc,
parsing
workflow
will
look
like
this.
In
addition
to
the
already
introduced
three
components,
human
will
be
involved
in
the
loop
after
the
disability
phase.
If
more
than
one
representation
remains,
it
means
we
discover
ambiguities,
so
we
provide
feedback
to
the
user.
Who
is
likely
the
specification
author
to
resolve
ambiguity
by
rewriting
the
sentences?
J
J
The
purpose
of
doing
semantic
parsing
is
to
derive
a
semantic
representation.
Therefore,
what
we
can
expect
is
to
take
in
a
textual
sentence
and
turn
it
into
a
chosen
intermediate
representation
and
in
our
work
we
use
a
semantic
representation
called
logical
form.
A
simple,
logical
form
is
composed
of
a
predicate
with
a
number
of
arguments.
J
For
example,
the
nlp
parser
wouldn't
recognize
all
the
nouns
or
noun
phrases,
so
we
have
to
use
so
in
this
work
we
use
a
generic
parser
called
spacey
and
we
extend
the
species
term
dictionary
with
networking
terms,
such
as
one's
complement
and
for
the
semantic
parts.
A
specification
might
include
some
idiomatic
usage,
for
example,
the
equivalent
mark
in
zero
equals
to
echo
reply
is
not
a
sign
that
can
be
processed
by
the
generic
parser.
J
J
J
J
J
I'm
only
giving
you
an
idea
of
what
kind
of
evaluation
we've
done
in
sage
we
put.
The
sage
is
on
rcmp
rfc.
We
apply
the
sage
on
and
prfc
and
use
generic
linux,
pin
and
truss
route
on
the
center
side
and
use
the
package
formulation
function
to
generate
a
reply.
Packet
back
to
the
center,
the
tested
ping
and
threshold
programs
are
able
to
process
our
automatically
generated
reply
packets.
J
J
While
siege
takes
a
significant
step
towards
automatic
specification
processing,
their
much
work
remains
in
an
rfc.
There
are
many
components:
stages
able
to
parse
packet
formats
pseudocode,
but
there
remains
other
components.
It
hasn't
supported,
such
as
stage
machine
communication,
pattern
and
architecture.
J
In
addition
to
the
limitations
shown,
there
are
other
challenges
that
can
also
be
considered
from
code
generation
perspectives,
thinking
of
stage
as
not
only
a
method
to
understand
the
level
of
ambiguities,
but
also
a
method
to
automate
the
code
generation.
There
is
a
lot
of
room
to
automate
the
code
generation
part
there's,
for
example,
sage,
mostly
parses
sentence
by
sentence
to
determine
a
sentence,
ambiguity
and
generates
code.
According
to
the
order
of
descriptions,
however,
it's
unrealistic
to
assume
that
all
specifications
use
self-contained
sentences
every
time.
J
Another
challenge
could
be
discovering
the
mismatched
or
miscaptured
behavior.
For
example,
we
have
both
text
and
syntactical
components
which
might
be
a
diagram.
What,
if
the
information
in
the
diagram
is
not
inc,
is
not
consistent
with
the
textual
description
or
what?
If
a
piece
of
information
is
illustrated
in
a
diagram,
but
there
is
no
textual
explanation.
J
Supposing
that
we
are
able
to
process
a
single
rfc
already,
what
about
some
protocol
describing
its
functionalities
or
constraints
across
multiple
rfcs?
In
some
cases,
a
protocol
gets
revised
and
added
more
constraints
in
implementation,
with
the
publication
of
another
rfc.
How
to
aggregate
the
information
from
multiple
rfcs.
J
Then
there's
another
type
of
challenge,
which
is
aggregating
the
information
of
multiple
protocols
instead
of
one
protocol.
In
practice,
we
send
packets
with
a
stack
of
protocols
when
we
want
to
generate
the
code
of
a
stack
of
protocols.
How
do
we
exactly
organize
correctly
organize
and
categorize
the
information
from
diverse
rfcs?
J
J
That
means
we
will
know
the
political
implementation
should
be
logically
correct,
but
not
about
how
the
performance
will
be.
If
the
spec
author
suggests
any
performance
oriented
implementation,
will
we
be
able
to
differentiate
them
and
leave
out
the
implementation
as
the
flexibility
or
the
freedom
of
the
code
of
code
implementer.
J
I
have
shown
this
figure
previously.
It
illustrates
how
we
process
an
rfc
document
and
involve
a
feedback
loop
to
indicate
the
human
user,
where
the
ambiguity
exists.
I
would
like
to
point
out
there
is
still
room
to
improve
the
involved
human
efforts,
such
as
how
to
assist
editing,
ambiguous
sentences
or
to
extend
sage
to
support
more
protocols.
J
This
leads
to
another
two
challenges:
we
need
to
deal
with,
considering
how
we
can
assist
editing,
ambiguous
sentences.
We
might
also
want
to
ask:
can
we
avoid
writing
an
ambiguous
sentence
in
the
first
place?
As
for
the
other
challenge,
we
have
only
talked
about
a
subset
of
specification
components.
What
else
protocols
are
we
going
to
support.
J
J
As
for
the
second
challenge,
we
are
aware
that
stage:
4
protocols
require
considerably
more
complicated
operations
and
many
protocols
belong
to
this
group.
Unlike
stateless
protocols,
a
staple
protocol
is
required
to
keep
internal
states
to
integrate
packet
contents.
Given
the
current
state,
the
ability
to
keep
historical
information
could
allow
us
to
significantly
cover
more
protocols.
J
J
As
one
of
the
options
we
might
design
interfaces
similar
to
microsoft,
archive
which
leverage,
the
web
interface
and
puzzle-like
shapes
to
generate
executable
codes.
But
for
us
our
output
becomes
the
english
sentences
once
we
design
a
parser
and
parse
the
information
received
from
the
user
interface,
we
leverage
the
information
to
both
compose
easy
to
read
english
sentences
and
the
executable
code.
J
J
A
A
One
question
for
me:
I
mean
you,
you
know
your
you
know,
but
both
this
this,
this
mechanism
and
the
the
mechanisms
you've
developed
and
the
mechanisms
and
the
previous
talk
are
all
based
around
natural
language,
processing
and
so
on.
How
how
expensive
are
they
to
run
how
what
was
the
sort
of
runtime
for
these
models.
J
The
round-time
is
longs
for
processing
the
icmp
rfc.
It
contains
seven.
Oh
sorry,
I
forgot
the
exact
number
messages
but
to
to
process
that
whole
rfc
is
runs
for
about
five
minutes.
J
J
Yes,
good
because
part
of
the
complexity
is
due
to
the
the
lp
parser
that
it
has
to
analyze
how
many
different
kinds
of
way
to
generate
the
semantic
representation.
J
So
there
are
ones
that
you
have
a
so,
for
example,
if
you
have
a
noun
phrase
that
can
be
interpreted
in
multiple
ways,
then
all
of
them
have
to
get
enumerated
to
generate
the
representation,
so
that
takes
that
takes
the
majority
of
the
time.
F
In
the
room,
yes,
I
am
thanks.
Well,
okay,
maybe
I'm
a
little
bit
sidetracking
with
the
comments
to
start
towards
colin
colin.
You
said
well,
okay,
we
are
used
to
do
the
specifications
in
english
prose
well,
okay
kind
of
one
some.
Sometimes
we
enhance
the
specifications
by
some
formal
stuff.
F
J
Okay,
so
for
the
ascii
arts
parts
we
didn't,
we
didn't
use
the
nlp
parts
lv
tool
to
parse
that
instead
we
parse
the
ascii
art
of
the
packet
header
by
writing,
another
small
program
to
parse
it,
and
it
can
automatically
extract
the
structure.
J
The
the
packet
header
structure,
so
that's
one
of
it
and
I
just
want
to
address
in
another
perspective-
is,
as
our
current
work
is
trying
to
extract
the
essential
information
out
of
it
by
by
looking
at
those
essential
information,
we
are
thinking
not
only
to
generate
the
english
text
or
the
executable
code.
We
also
are
thinking.
We
are
also
thinking
to
use
those
information
to
generate
the
diagram
automatically.
J
A
I
Hi-
and
that
was
a
really
interesting
talk-
thank
you.
I
was
just
sitting
there
thinking,
wouldn't
it
be
easier
for
us
to
formally
specify
protocols
and
then
use
some
tool
with
nlp
to
generate
the
pros
spec,
and
then
we
can
be
sure
that
the
pro
spec
actually
implements
the
formal
specification,
and
then
we
can
check
the
formal
specification.
J
Yeah,
so
I
agree
that
is
also
one
way
to
do
that.
The
thing
is
we
have
to
gap.
We
we
have
to
fill
in
the
gap
of
how
to
convert
between
the
former
specification,
a
formal
language,
former
specific,
specific
engineering
language
to
the
english
text,
so
that
part
is
still
missing
over
there
and
it's-
and
I
think,
there's
one
problem
over
here
is
there
are
too
many
different
kinds
of
intermediate
representations,
so
formal
specification
language
is
sort
of
like
another
intermediate
representation.
J
You
can
always
choose
like
different
ones
and
then
try
to
fill
in
the
gap
from
the
english
text
to
that
representation.
So
I
I
would
say
that
is
totally
a
doable
and
actually
a
good
way
to
go
from
it.
But
our
first
work
stage
is
to
just
directly
choose
a
simpler
way
to
do
the
first
first
step
to
analyze
whether
we
can
find
some
ambiguities,
but
for
further
specification.
J
I
agree:
that's
writing
a
formal
specification
language
and
then
and
then
generate
the
english
text
might
be
also
a
very
doable
way,
but
I
think
we
start
from
like
a
smaller
scope
from
the
state
machine
to
just
have
some.
J
Like
simpler,
probably
some
simpler
language,
english
text,
language
and
just
to
generate
a
very
cool,
cruel,
simple,
english
text,
but
we
have
to
see
whether
that
is
a
good
approach
first,
otherwise
that
we
are
not
sure
whether
we
should
go
from
the
formal
specification
language
to
the
english
text,
because
formal
specification
specification
language
text.
I
believe
it's
more.
You
need
to
take
some
time
to
learn
and
to
pick
up
and
then
to
do
those
things.
J
A
A
All
right,
thank
you.
The
next
speaker
is
chris
wood.
K
Yeah
hello,
trying
to
figure
out
how
to
control
this
remotely
all
right.
K
All
right
thanks
all
right
everyone,
my
name
is
chris
wood.
Do
some
work
a
number
of
places
in
the
ietf,
in
particular
in
the
cfrg,
as
a
as
of
I
guess
recently,
do
I
help
out
producing
technical
specifications,
for
that
particular
group,
and
what
I'm
going
to
try
to
talk
to
you
about
is,
I
think,
a
problem
that
has
been
emerging
in
that
particular
group,
which
is
not
it's
not
only
in
the
cfrg.
It
exists
elsewhere
and
it's
on
it's
on
this
topic
of.
You
know
how.
K
How
can
we
improve
the
quality
of
specifications
in
in
in
the
cfrg
and
elsewhere
in
the
ietf
and
irtf,
and
I
should
say
first
and
foremost
that
this
is
very
much
based
on
you
know
my
own
personal
experience.
Writing
these
things,
trying
to
like
communicate,
technical
things
in
a
way,
that's
clear
and
concise
to
people.
K
I've
not
not
like
talked
about
this
with
many
other
people.
I've
not
like
asked
for
a
review
on
the
slides.
So
I
you
know,
if
you
disagree,
that's
cool.
We
should
talk
about
it
and
this
does
not
like
consensus
of
the
cfrg
in
any
way
particular
or
any
any
way,
shape
or
form.
Also,
I'm
not
trying
to
like
point
fingers
or
blame
people,
and
in
fact
I,
like
I
work
on
a
lot
of
documents,
so
I
should
just
be
pointing
the
fingers
at
myself.
K
Okay,
so
I'm
going
to
start
with
what
I
think
is
sort
of
the
the
problem,
or
maybe
it's
not
a
problem,
but
I
depending
on
your
perspective,
but
I'm
I'm
going
to
refer
to
it
as
a
problem.
K
So
in
in
theory
the
cfrg
is,
is
is
chartered
to
do
a
lot
of
things.
It's
pretty
broads
broke
pretty
broad
charter.
You
could
summarize
it
in
the
sentence
that
I
have
sort
of
coded
here.
K
It's
meant
to
sort
of
effectively
bridge
things
that
happen
in
in
academia
in
theory,
and
sort
of
bring
them
into
practice,
documenting
things
that,
like
are
written
down
in
cryptographic
like
related
papers
or
and
published
in
conferences
and
whatnot,
and
present
them
in
a
way
that
people
can
actually
use
them
and
to
do
so
via
what
are
referred
to
as
informational
rfcs.
K
An
informational
rfc
just
contains
information
does
not
go
through
the
same
sort
of
like
editorial
or
or
rather
review
process.
The
same
sort
of
rigor
that
other
things
in
the
ietf
and
elsewhere
go
through
so
that
sort
of
bar
to
publication
of
an
informational
rfc
has
like
historically
been
lower
than
that
of
like
a
standard
track
or
proposed
standard.
K
That's
kind
of
a
problem
because
in
practice
the
cfrd
specifications
I
think,
demand
a
much
much
higher
bar
for
what
they're
actually
being
used
for
you.
You
could
go
back
and
look
at
all
the
many
of
this
many
of
the
specifications
or
rfcs
that
came
from
the
cfrg
and
look
at
how
often
they're
used
and
referenced
in
practice,
and
it
should
be
very
clear
that
these
are
very,
very
important
documents
like
things
like
hmac,
hkdf,
crypto9,
eddsa
and
hpke.
That's
a
recent
one.
K
K
Moreover,
these
specifications
that
are
developed
have
a
huge
audience.
They
are
meant
to
be
useful
to
people
who
are
designing
protocols.
They
want
to
be
able
to
consume
these
cryptographic
objects
they're
also
useful
to
people
reviewing
these
particular
specifications
to
make
sure
they're
consistent
with
things
that
have
been
like
analyzed
and
proved
in
practice,
and
there's
probably
lots
of
other
audiences
as
well.
K
But
the
gist
is
that
the
the
stakes
are
very
high.
I
think
for
the
things
that
the
cfrg
does,
despite
what
the
charter
lists,
so
quality
of
its
outputs
is,
is
critical
and
so.
K
I'm
going
to
be
trying
to
focus
on
like
specification
quality
here,
and
what
do
I
mean
by
specification
quality,
so
I
I
claim
that
there
are
at
least
three
sort
of
fundamental
aspects
of
a
technical
specification,
first
of
which
is
like
the
functional
piece
of
it
like.
K
What
does
this
thing
do
and
cfrg
is
actually
kind
of
it's
quite
nice
in
that,
like
typically,
the
things
that
it
works
on
are
much
much
simpler
compared
to
protocols
that
the
ietf
standardizes,
so
it
was
only
until
I
guess
recently
that
we
actually
started
looking
at
real
protocols
that
involve
like
sending
messages
between
different
parties.
Historically,
it's
just
been,
like
you
know,
an
algorithm
specifying
algorithm
that
has
some
steps
and
an
api
and
a
syntax
and
stuff
like
that.
K
K
The
second
part,
or
a
second
part,
is
what
I
refer
to
as
the
syntax
specification,
and
it's
like
how
do
I?
What
does
its
api
look
like?
How
would
I
interact
with
it
if
I
were
to
implement
this
in
like
a
type
safe
language?
What
would
that
api
look
like,
and
how
might
I
enforce
it?
And
the
third
component
is
like
the
actual
implementation
like
how
do
I?
How
would
I
write
code
to
implement
this
particular
algorithm
and
to
do
so
correctly
and
keeping
in
mind
the
different
audiences
that
were
listed
earlier?
K
It's
important
to
make
sure
that
each
of
these
different
components
of
a
specification
are
tailored
to
the
relevant
audiences.
So,
for
example,
the
implementation
specifications
would
be
written
such
that
implementers
can
actually
use
it
and
implement
the
thing
correctly,
whereas,
like
the
functional
piece
of
a
particular
specification
needs
to
be
written
such
that
people
who
want
to
just
figure
out
what
this
thing
is
doing,
can
do
so
without
having
to
understand
the
implementation.
Details
like
you
shouldn't
have
to
be
a
cryptographer
to
understand
like
what
an
rsa
signature
is
or
whatever
like
that.
K
These
are
like
simple
concepts
and
people
building
these
protocols
in
the
ietf
should
not
have
to
be
experts
to
use
them.
They
should
be
like
easy
to
use
hard
to
misuse
that
sort
of
thing,
blah
blah
blah,
okay
and
to
figure
out
whether
or
not
you
know
the
the
the
specification
or
the
the
different
components
of
a
specification
written
satisfy
these.
Like
these
goals,
there's
like
there's
like
questions
you
can
ask
yourself
as
an
editor
or
an
author
of
a
document.
K
I
think
the
the
first
and
foremost
one
that
comes
to
mind
for
me
is
like.
Is
the
specification
easy
to
use
easy
to
understand
and
use
like?
Can
I
just
like
read
it
and
figure
out
what
the
thing
is
doing?
Can
I
reason
about
like
the
behavior
of
the
cryptographic
object?
Could
I
reasonably
like
think
about
what
the
api
might
look
like
and
what
the
exceptional
cases
might
be
if
I
try
to
hold
this
thing
the
wrong
way.
K
Second,
most
important
question
is
like
if
I'm
reading
this
thing-
and
I
were
to
try
to
do
a
faithful
implementation
of
it,
would
I
produce
a
consistent
and
correct
implementation
that
I
could
interoperate
with
other
people's
implementations,
and
that
depends
on
you
know
the
behavior
being
well
defined.
It
depends
on
the
implementation,
description
being
clear
and
so
on
and
lots
of
other
questions,
I'm
sure
would
come
up
to
to
try
to
figure
out
whether
or
not
you're
on
the
right
track.
K
So
I
want
to
look
at
a
couple
examples
that
the
the
the
cfrg
has
done
and
to
try
to
reason
about
these
two
high-level
questions
is
the
specification,
easy
to
use
and
understand,
and
will
it
yield
consistent
and
correct
implementations,
and
I
picked
on
an
rfc
that
gets
picked
on
a
lot
rfc
8032
for
addsa.
K
Has
this
one
component
of
the
draft
specifies
how
you
like
verify
digital
signatures?
Don't
need
to
read
this.
K
In
fact,
I
encourage
you
not
to
read
it
because
you
might
get
bored,
but
I
will
just
note
that
in
this
this
final
piece
of
this
particular
body
of
text,
they
have
something
like
you
could
do,
or
we
recommend
you
do
this
one
thing
for
correctness
and
security
and
safety,
but
maybe,
if
you're
feeling
up
for
it.
Maybe
you
do
this
other
thing
instead
and
it
turns
out
that
has
been
the
cause
of
tremendous
amount
of
pain
and
practice.
It's
been
the
cause
of
like
security
problems
in
practice.
K
People
just
don't
agree
on
like
what
signatures
are
valid
and
that's
that's,
probably
not
a
desirable
property
for
a
digital
signature
scheme.
So
I
would
claim
that
to
answer
the
first
question
like
yeah,
the
specification
is
easy
to
use
and
understand,
like
it's
actually
a
really
well
written
technical
specification,
but
was
it
written
in
such
a
way
that
it
yields
consistent
and
correct
implementations?
Unfortunately,
not,
and
just
my
own
personal
opinion,
like
I
said,
if
you
disagree
with
any
of
this,
that's
fine,
we
could
talk
about
it.
K
Okay,
another
example
is
actually
an
ongoing
specification,
that's
being
developed
right
now.
This
thing
called
opaque,
opaque
is
without
describing
what
these
terms
are.
It's.
Basically,
it's
a
compiler
for
taking
a
bunch
of
different
pieces
or
a
bunch
of
different
cryptographic.
Algorithms
gluing
them
together
and
the
the
result
is
a
is
a
password
authenticated
key
exchange
protocol,
conceptually
like
once
you,
if
you
understand
the
object
and
understand
the
protocol,
it's
pretty
easy
to
think
about,
unfortunately,
to
like
actually
understand
the
spec
and
implement
it
correctly.
K
You
have
to
understand
and
be
familiar
with.
All
these
other
concepts
like
what
the
authenticate
exchange
thing
is.
What
this
vo
prf
thing
is
what
what
hash
to
curve
means,
what
a
prime
primordial
group
means
and
there's
other
relevant
specifications
not
listed
here
that
are
likely
also
in
this
dependency
tree
that
are
like
necessary
to
understand,
opaque,
and
so
I
to
to
to,
I
guess,
answer
the
two
questions.
I
would
say
that
this
specification
is
not
easy
to
understand
and
use
and
will
yield
correct
and
consistent
implementations.
K
Their
stress
factors
and
people
have
done
it.
So
maybe
that
that
check
is
a
bit
biased,
but
maybe
there
should
not
be
a
check
there,
because
it's
still
an
ongoing
draft,
so
we
don't
know,
but
the
takeaway
is
that,
like
this
is
an
incredibly
complicated
document,
it
could
be
much
potentially
be
much
simpler.
If,
if
you
try
to
change
up
like
what
the
what
the
the
features
that
were
supported
are
and
and
and
whatnot,
but
it's
not.
C
K
Okay,
so
I
think
the
the
the
the
core
problem
here
is
that
the
nature
of
the
work
that
the
cfrg
does
is
highly
technical
and
it's
highly
technical
and
has
to
be
consumable
by
like
different
audiences,
with
different
levels
of
expertise
and
different
purposes
for
consuming
that
technical
specification
like
an
implementer,
just
wants
to
know
how
to
implement
the
thing
the
protocol
designer
wants
to
know
how
to
use
the
thing
and
the
the
problem
is
like
this
is
a
hard
problem,
so
there's
like
no
way
around
it
other
than
just
acknowledging
that
this
is
a
hard
problem,
like
writing.
K
A
technical
specification
that
takes
all
of
these
boxes
in
such
a
way
that
it's
maximally,
useful,
easy
to
implement
and
correct
and
consistent
is,
is
not
an
easy
thing
to
do,
and
we've
been
doing
it
for
trying
to
do
it
for
many
many
years.
I
do
think
we're
improving,
but
I
think
there's
more
that
can
be
done
and
trying
to
shine
a
light
on.
You
know,
potentially
more
weight,
more
things
that
can
be
done.
K
So
I
wanna,
I
wanna,
call
out
one
particular
draft
that
is
going
through
the
process
right
now
to
kind
of
showcase.
How
we
chose
to
you
know,
structure
the
specification
in
such
a
way
that
it
does
try
to
tick
some
of
these
boxes
in
what
I
consider
to
be
a
useful
way
for
people
who
are
consuming
the
spec,
so
this
draft
called
hash
to
curve.
K
It
is
again
without
going
into
any
of
the
math
details.
It's
basically
just
like
a
hash
function
or
it's
a
specification
that
describes
a
hash
function
that
takes
as
input
arbitrary
data
a
string
and
produces
a
point
on
a
curve,
and
that's
basically
it
at
the
highest
of
levels.
It's
a
very
simple
interface.
K
It
can
be
summarized
in
this
one
picture
here:
it's
internal
steps
are
like
quite
simple
and
and
and
and
pretty
easy
to.
I
guess
like
if,
if
you
don't
want
to
know
more,
it's
it's
easy
to
figure
out.
I
guess
what
what
the
I
guess
high
level
functionality
is
basically
from
this.
K
So
as
I
go
through
next
couple,
slides
just
I
guess,
keep
in
mind
and
ask
yourself,
you
know
is
like
based
on
the
limited
set
of
information
I'm
going
to
present
like
is
the
specification,
easy
to
use
and
understand,
and
will
it
you'll
consistent
in
correct
implementations?
K
Okay,
so,
let's
consider
the
the
functional
description
of
this
particular
specification.
First,
that
is
what
the
part
of
the
specification
that's
written
for
the
people
who
just
want
to
understand
what
hashing
to
curve
means.
K
So
I'm,
if
I'm
reading
this
specification
for
the
first
time
I
might
go
to
the
part
of
the
spec
that
describes
like
what
this
functionality
is-
and
I
come
across
this.
This
blurb
here
that
describes
hash
to
curve
with
some
input,
produces
an
output
and
there's
some
there's
just
little.
There's
these
few
steps
here.
K
Okay,
but
maybe
I
want
to
know
like
what
these
steps
are,
helpfully
there's
like.
Similarly
simple
descriptions
in
the
draft
that
describe
each
of
these
different
steps
without
any
of
the
implementation
details
like
you,
don't
need
to
understand
them
to
figure
out
like
what
this
particular
function
is
doing
like
this.
This
is
sufficient.
I
would
claim
for
understanding
what,
like
the
function
of
hash
to
curve
is
doing,
and
I
think
that's
like
a
good
balance
for
this
particular
audience.
K
K
It
sees
like
oh
there's
this
there's
this
thing
called
map
to
curve,
as
one
of
the
steps
that
I
have
to
you
know
invoke
when
like
in
an
implementation
of
hash
to
curve.
What
is
this
map
to
curve
thing
to
do?
K
There's
specific
sessions
that
describe
what
the
high
level
syntax
and
api
for
hash
to
curve
would
be
for
different,
like
curves
that
you
might
target.
I
just
talked
about
one,
this
alligator
two
method
for
one
map
to
curve
thing.
It
describes
basically
all
of
the
parameters
that
you
might
need
to
configure
for
this
particular
function.
What
are
the
constants
that
are
relevant
for
this,
like
particular
function?
K
What
are
the
exceptional
cases
that
might
you
might
have
to
specify
when
thinking
about
an
api
and
like
this
there's
this
other
thing
called
sign
of
the
output,
but
don't
worry
about
that
and
I've
truncated,
some
of
the
like
text
that
follows
because,
from
a
like
an
api
perspective,
this
is
kind
of.
I
would
say
mostly
what
you
need.
Okay,
now
you're
now,
let's
say
you're
like
you're
the
implementer,
and
you
want
to
actually
like
try
to
implement
this
thing,
to
make
sure
that
you
know
it.
K
I
I
can
use
it
for
some
protocol.
It
interoperates
with
some
other
implementation
again
start
on
the
left.
Go
to
this
map
to
curve
thing,
and
then
I
follow
the
pointers
down
to
like
the
actual
implementation
description
of
alligator
2
in
this
particular
case,
and
it
goes
into
excruciating
detail
about
every
single
step.
K
That
needs
to
be
done
with
like
comments
as
necessary
for
like
the
different
parts
of
the
pseudo
code,
noting
like,
for
example,
on
line
three
like
what
in
exceptional
cases
that
was
like
described
earlier
in
the
syntax
and
the
api,
and
this
seems-
and
moreover,
it's
not
it's
not
actually
included
in
this
back.
K
But
this
is
almost
there's
almost
a
one-to-one
mapping
between
this
implementation
description
as
written
here
and
the
reference
implementation
that
we
provided
in
sage,
not
the
sage
that
was
talked
about
in
the
previous
presentation
and
so
with
the
ultimate
purpose.
To
be
like
each
step
along
the
way,
minimizing
potential
source
of
errors
and
interrupt
bugs
or
whatever.
K
K
Coming
back
to
the
the
original
questions,
if
you
were
to
approach
this
document
and
ask
yourself
you
know,
is
it
easy
to
use
and
understand
and
we'll
yield
consistently
correct
implementations,
I'm
not
going
to
like
put
checks
or
x's
or
question
marks
next
to
them?
You
know
I'll
leave
it
to
you
to
decide
as
an
interested
reader,
but
I
offer
you
offer
it
to
you
as
a
you
know,
potential
way
to
think
about
how
we
might
structure
these
particular
documents.
I
think
this
one
went
through.
K
K
So
I
may
have
just
implicitly
answered
these
things,
but
anyways
okay,
drawing
on
hash
to
curve,
I
think
there's
a
number
of
things
we
can
do
going
forward
to
sort
of
improve
the
overall
output
of
the
things
that
we're
doing
in
the
cfrg
and
I'm
going
to
enumerate
just
a
couple
high-level
things
without
drilling
into
any
specific
details,
because
a
lot
of
it
will
vary
based
on
like
the
thing
that's
being
specified.
How
involved
it
is
like?
Is
it
a
protocol
or
not,
and
so
on?
K
K
So
maybe
that
means
like-
or
that
does
mean
being
consistent
where
consistency
matters
like,
if
you're
using
pseudocode,
to
describe
different
implementations
of
different
things
in
the
specification
use
the
same
format
of
pseudocode
at
every
single
step
of
the
way,
don't
introduce
like
one
different
format,
for
you
know,
section
a
and
then
in
the
appendix
use.
Something
entirely
different.
Consistency
everywhere
will
help
and
will
go
a
long
way
and
also
like
hashtag
curve.
K
Does
I
claim
that,
like
any
time
you
are
using
pseudocode,
try
to
make
it
as
close
as
possible
to
the
actual
reference
implementation
that
you
are
providing
with
your
specification,
and
we
should
require
reference
implementations
for
these
things,
but
there's
there's
a
lot
of
there's
been
a
lot
of
interesting
discussion
in
various
like
back
analysis
to
in
terms
of
like
what
this
reference
implementation
format
should
be.
Should
it
be
say,
should
it
be
rush
should
be
something
else?
K
K
And
on
the
heels
of
that,
I
would
say
that
consistency
sort
of
beyond
like
sort
of
format,
consistency
in
concepts
is
incredibly
important,
primarily
because
we
want
to
reduce
the
cognitive
load
of
people
who
are
reading
these
specifications,
so
not
only
like
reusing
formats
but
we're
using
concepts
in
draft
itself,
but
also
across
drafts.
K
So
if
we're
talking
about
say
elliptic
curves
in
one
draft
and
groups
or
prime
order
groups
in
another
draft,
that
could
be
confusing
to
people
because
sometimes
they're
used
interchangeably,
even
though
they
shouldn't,
but
sometimes
they
are
so
being
consistency
where
you
being
consistent
where
you
can
is
incredibly
important
using
a
shared
vocabulary
for
the
things
that
we're
talking
about
is
incredibly
important
and
I
think,
for
the
most
part,
the
the
some
of
the
documents
that
the
cfrg
is
working
on
right
now
are
like
certainly
they're
internally
consistent.
K
I
think
the
review
process
is
such
that,
like
it
gets
a
tremendous
amount
of
review
in
the
in
the
group
before
it
goes
out,
then
it
goes
through
like
this
formal,
the
review
panel
step
and
then
there's
a
number
of
other
steps,
but
I
don't
remember
what
they
are
and
I
think
like.
We
have
reached
a
level
of
internal
consistency,
but
external
consistency
across
drafts
is
it
could
be
improved
and
that's
a
hard
problem,
because
you
know
things
and
concepts
change
over
time
like
what
was
like.
K
Maybe
you
know
we
used
to
talk
about
the
curves,
as
you
know,
the
best
thing
since
sliced
bread,
but
now
we
want
to
mostly
we
were
talking
about
like
primordial
groups
instead
of
elliptic
curves.
We
want
to
kind
of
shy
away
from
elliptic
curves
in
general,
so
maintaining
consistency
over
time
is
is
difficult,
but
I
think,
for
you
know,
batches
of
drafts
that
are
being
developed
at
around
the
same
time.
It's
it's
not
on
a
reasonable
request
and
something
we
could
do.
K
K
There
is
a
tremendous
amount
of
work
and
referenced,
informally
referenced
and
formally
verified
implementations
of
especially
cryptographic,
algorithms,
like
hacks,
back
and
hackle
star
and
whatnot,
whether
or
not
they're
suitable
for
the
you
know,
permanent
nature
of
an
rfc
is
not
immediately
clear
to
me,
but
all
of
this
work
that
basically
lets
someone
take
one
single
implementation
and
you
know
produce
test
factors
from
it,
produce
other,
like
implementations
in
different
languages,
that
they
can
test
out
and
sometimes
produce
like
optimized
implementations
as
well,
that
are
just
suitable
to
plop
in
production
systems.
K
Those
those
are
those
are
good
things
and
it
removes
any
like
choice.
Chance
of
potential
error
from
someone
implementing
these
things
themselves,
trying
to
you
know,
interpret
an
implementation,
description
or
syntax
description
or
whatever
and
potentially
getting
it
wrong.
I
I
was
alluding
to
earlier.
It's
like
still
not
clear
what
it's
like
the
right
format
for
this
this
these,
like
reference
implementations,
be
it
like
something
based
in
python.
K
You
know
that
a
dsa
rc8032
uses
python
to
describe
some
of
the
relevant
steps.
More
recently,
things
have
been
using
a
lot
of
sage.
K
I
guess
you
can
call
it
a
library,
a
library
of
like
the
the
specs
that
are
being
developed.
The
building
blocks
and
and
drafts
are
like
building
on
these
building
blocks
and
adding
more
building
blocks
and
and
and
whatnot.
There's
no
reason
you
couldn't
do
this
in
rust
or
hacks
back
or
whatever.
I
think
it
it'd
be
interesting
to
experiment
with
different
reference
implementations
and
see
what
is
most
approachable
from
an
implemented
implementer's
and
what
is
what
is
good
for
really
the
you
know:
inclusion
in
an
rc.
K
Okay,
so
I
want
to
wrap
up
and
lose
some
time
for
questions,
so
I
my
core
point
here
is
that,
like
the
work
that
the
cfrg
does
again
is
incredibly
important,
we
don't
call
them
standards,
they're,
not
technically
standards,
but
I
think
if
it,
you
know
walks
like
a
duck
and
quacks
like
a
duck.
It
is
a
duck.
These
things
should
be
considered
standards.
K
In
my
opinion,
they
are
incredibly
valuable
and
important
to
what
the
ietf
and
irtf
is
doing,
and
I
think
we
should
treat
them
as
such
there's
definitely
a
lot
more
work
that
we
can
be
doing
both
within
the
group,
even
in
the
rtf,
for
you
know
producing
tooling,
like
sage
or
the
name
that
was
in
the
first
one.
I
apologize
to
improve
the
the
quality
of
these
specifications.
K
There's
like
ways
that
document
editors
can.
You
know
right
now
start
improving
the
quality
of
their
specifications
through,
like
better
pros,
better,
better
use
of
terminology,
better,
more
consistency.
What
have
you-
and
I
think
it's
about
time-
that
we
really
took
the
question
of
or
the
the
task
of
exploring
formal
method
approaches
for
cfrg
specifications
seriously?
And
I
don't
know
I
don't
know
what
that
looks
like
in
practice
if
it's
like
a
task
force
that
go
off
and
like
try
to
you
know,
work
through
these
different
options.
K
If
it's
a
group
in
the
rtf
to
sort
of
explore
this
area,
I
think
you
know
anything
is
a
reasonable
start
here,
but
I
think
it's
given
all
the
work
that's
been
done.
I
think
it's
it's
now.
It's
now
time
to
start
actually
pulling
it
into
the
cfrg
with
that,
I
guess
I'll
I'll
close
for
questions.
If
you
have
any.
A
I
think
thank
you
very
much.
Chris
jonathan.
I
So
I
was
just
thinking
if
we
are
going
to
have
reference,
implementations
and
pseudo
code,
which
has
precedence
right
if
the
reference
implementation
doesn't
agree
with,
what's
written
in
the
spec,
which
one
do
we
consider
correct,
do
we
have
to
decide
on
a
case-by-case
basis,.
K
My
personal
opinion
is
that
we
should
have
reference
implementations
in
no
pseudo
code,
but,
like
the
that's
just
not
the
way,
specs
are
published
right
now
we
have
pseudocode
to
describe
the
things
and
reference
implementations
sort
of
hang
off
the
side
of
them.
A
All
right,
thank
you,
thomas.
L
K
I
don't
think
so,
for
that
I
I
I
like
does
it.
I
don't
think
so,
whether
or
not
that
would
I
mean
the
topic
of
like.
Should
the
cfrgb
and
irtf
versus
itf
has
come
up
before
my
general
stance
has
been
like
if
it's
not
broke,
don't
fix
it
like
from
a
like
a
iotf
versus
ietf
perspective,
but
I
I
can't
see
any
harm
from
like
rechartering
to
potentially,
you
know
explicitly
include
like
non-informational
documents.
A
I
mean
there
are
procedural
issues
there,
but
in
practice
we
cfog
should
be
producing
as
good
equality
documents
as
it
can
and
yeah.
We
can
have
the
arguments
about
whether
it
should
be
an
atf
or
an
ircf
group.
Later.
A
M
I
guess
I'll
go
siobhan
said
brave.
This
goes
to
a
discussion
that
we
were
having
on
slack
chris,
but
security
proofs
and
like
do
you,
think,
there's
a
conflict
between
this.
The
proof
and
implementation,
complexity
or
simplicity.
Then,
like
should
you?
What
should
you
do,
and
maybe
you
can
say
in
the
draft
like
go
and
look
at
this
other
place
where
we
did
something
slightly
different
and
then
proved
it
yeah
just
wondering
what.
K
I
think
anecdotally,
like
the
security
proof,
should
trump
everything
if
the
implementation
doesn't
match.
There
should
be
like
very
clear
reason
as
to
why
it
doesn't
match,
perhaps
with
like
justification
as
to
why
it
would
deviate
from
what
has
been
analyzed
in
practice,
but
I
think
we
should
strive
very
hard
to
keep
like
all
these
things
consistent,
and
I
would
include
like
the
the
the
security
model
and
the
analysis.
That's
been
done
as
part
of
like
the
description
of
the
specification,
so
yeah
keep
everything
consistent.
G
G
I
have
mixed
feelings
about
reference
implementations,
because
I
fear
that
for
some
cryptographic,
algorithms
having
a
reference
implementation
would
lead
to
more
errors
than
on
carefully
written
one,
meaning
timing,
errors
or
stuff.
Like
that
possible
timing
attacks,
I
mean
somebody
that
just
grabbed
the
reference
implementation
transform
it
in
another
language
by
some
mathematic
immediate,
don't
use
the
brain
and
go
on.
G
I
would
love
to
have,
however,
some
test
cases
like
if
these
numbers
are
there.
You
must
reach
this
point,
because
I
fear
that
in
some
cases
this
could
be
more
useful
to
debug
or
test
case.
Your
particular
implementation.
K
I
mean
yeah.
I
certainly
agree
that
the
test
cases
and
test
factors
are
invaluable
and
I
think
most
of
the
the
specifications
that
I'm.
K
Yeah
yeah
in
terms
of
like
whether
or
not
reference
implementation
should
exist
that
it
was
actually
an
explicit
choice
to
use
something
like
sage
to
describe
these
like
hashtag
based
drafts,
because
you
can't
just
like
easily
take
sage
code
and
plop
it
in
production.
We
want
to
make
sure
that
was
like
something
you
did
not
ever
do
and
we.
Furthermore,
we
took
steps
in
the
reference
implementation
to
try
to
describe
you
know
or
avoid
sort
of.
I
guess
implementation
pitfall.
K
So,
for
example,
if
there
you
need
to
do
like
a
comparison
of
two
like
byte
strings
and
constant
time
we
didn't
just
like
in
python
say:
like
does
a
equal
b,
we
would
say
like
if
a
is
equal
to
b
constant
time,
I
forget,
like
the
extra
actual
string,
but
like
you
make
it
very
clear,
like
you
know
this,
this
is
how
you
would
actually
implement
in
practice,
but
I
mean
it's
a
good
point.
Like
should.
Should
reference
implementation
exist,
I
I
I.
G
Tend
to
think
they're
useful,
but
write
a
reference
implementation
that
cannot
be
used
as
it
is
by
let's
say,
an
undergraduate
student
doesn't
actually
have
time
to
actually
do
something
right.
That
would
be
great
and
I'm
all
up
for
it.
K
Yeah,
I
mean,
I
guess
the
other
other
side
of
that
coin.
Is
there
are
projects
like
I
mean
like
the
the
recent
nist
pqc
competition
that
all
the
reference
implementations?
There
were
actually
very
high
quality
implementations
and
they
perform
very
well.
You
could
reasonably
take
some
of
those
and
plot
them
in
production.
If
you
were,
you
know,
if,
if
you
wanted
to
do
so-
and
I
guess
it
depends
on
you
know
what
sort
of
bar
you
hold
for
you
know
the
reference
implementation.
G
D
Hey
this
is
max
great
talk.
I
just
wanted
to
say
I've
found
at
least
one
protocol
where
not
the
pseudo
code,
but
the
english
language
text
did
not
match
the
reference
implementation,
not
an
ietf
one
but
one
from
outside.
So
that
can
happen
too,
and
I
I
do
think
it's.
Maybe
a
good
idea
to
work
be
worth
considering
to
say
what
what
should
be
that
somebody
else
mentioned
like
believed.
D
A
All
right,
thank
you
very
much
just
to
follow
on
from
that
last
point,
I
think
it's
very
very
clear
that
yeah
chris
had
some
some
great
examples,
but
this
is
very
much
not
a
a
cfig
specific
set
of
problems.
There
are
ambiguities
and
issues
with
all
of
the
specifications
and
certainly
looking
at
the
the
dependency
graph
colin
jennings
has
for
the
web
rtc
specifications
and
the
sprawling
massive
documents
there.
I
think
it's
it's
clear.
A
So
what
what
I've
been
trying
to
do
with
these
talks
is
is
to
sort
of
try
and
stimulate
some
discussion,
try
and
see
if
there's
interest
in
your
research
on
protocol
specification
thinking
about
how
we
specify
protocols
in
in
the
in
in
the
itf
and
the
irtf
communities,
it's
time
to
think
about
what
are
the
research
challenges
in
ineffective
protocol
specification
and
how
we
can
improve
the
way
we
specify
protocols.
A
There's
certainly
been
a
lot
of
discussion
in
the
chat
and
a
lot
of
really
good
questions
to
the
talks,
and
I
think
there
are
three
fantastic
talks
there.
So
thank
you,
everybody
for,
for
both
giving
the
talks
and
for
the
discussion
in
the
chat.
A
I
I
am
certainly
looking
to
to
gauge
interest
to
see
if
people
are
interested
in
potential
irtf
work
in
this
area.
If
you
are
interested
in
this
pl,
please
do
talk
to
the
speakers.
A
Please
do
talk
to
me
and
if,
if
you
would
be
interested
in
following
up
on
this,
put
your
names
into
the
that
the
documents
that's
linked
on
the
slide-
and
I
I
just
put
in
the
chat
and
we're
you
know.