►
From YouTube: ANRW-NetworkFunctionsAndMiddleboxes
Description
NETWORKFUNCTIONSANDMIDDLEBOXES meeting session at ANRW
A
The
bird,
the
bane
of
the
existence
of
the
ITF,
so
this
session
has
three
tax
interesting
mix
of
things,
so
the
common
theme,
of
course,
is
what
happens
in
between
the
people
who
are
trying
to
get
useful
work
done
by
things
in
the
middle
that
are
trying
to
help
them
happen.
Quotes
first,
talk
is
limitless
HTTP
in
an
HTTP
world,
where
we
are
trying
the
author's
try
to
infer
semantics
of
HTTP
operations
by
observing
traffic
without
having
to
decrypt
it.
A
B
We
want
to
be
able
to
make
even
more
detailed
inferences
about
the
HTTP
headers
that
are
inside
that
frame
and
in
some
cases
we
want
to
be
able
to
say
what
were
the
actual
values
of
those
HTTP
headers
and
a
higher-level
goal,
which
I'm
not
going
to
really
spend
too
much
time
on.
But
given
those
inferences
so
given
the
fact
that
we
can
annotate
a
TLS
session
with
this
information
of
where
the
headers
are
and
what
those
specific
HTTP
fields
and
values
for
those
headers,
you
know
the
what
they
actually
are.
B
B
You
know
from
the
the
point
of
view
of
the
attacker,
it's
mostly
centered
around
website,
fingerprinting
from
the
point
of
view
of
the
defender,
it's
mostly
centered
around
identifying
malicious
communication,
malicious
websites
and
doing
things
like
identifying
data
exfiltration
by
you
know,
matching
the
sizes
of
the
data
objects.
You
know
downloaded
or
uploaded
versus
something
that's
in
a
contextual
database,
and
so
the
the
the
motivating
factor
for
all
of
this
is
this
is
what
many
enterprises
look
like
today.
It's
very
generic
cartoonish
picture,
but
we
have
some
set
of
clients,
they're
internal.
B
They
want
to
talk
to
the
internet,
so
there's
a
man-in-the-middle
proxy
in
between
those
two
that
will
decrypt
the
traffic
and
make
sure
that
it's
good
and
then
send
it
on
its
way.
And
basically
this
work
is
saying:
if
we,
if
we
eliminate
the
men
in
the
middle
box-
and
we
want
to
do
you
know
how
many
of
those
features
can
we
get
away
with
I?
Do
it
by
passively
observing
the
TLS
traffic
you
know.
B
You
know
this
view
that
we
that
we
can
passively
observe
on
the
wire
with
a
view
that
has
you
know
this
detailed
data,
the
the
TLS
application
data
records
that
are
annotated
with
the
information
that
we
want
to
be
able
to
infer.
So
you
know
it's
like
99%
of
what
we
did
was
you
know,
building
out
those
training
data
sets
and
then
the
last
1%
was
some
light
machine
learning,
and
so
we
started
by.
Obviously
we
needed
the
TLS
key
material
and
you
know
interesting
and
rude
Android.
B
She
did
most
of
most
of
this
work
and
I.
You
know
I
asked
him,
you
know,
given
a
memory,
don't
go,
find
all
of
the
the
TLS
master
secrets
and
do
all
these
like
entropy
type
things
and
I
thought
was
gonna,
be
like
a
summer
long
project.
He
came
back
that
afternoon
was
like
oh
there,
here's
like
three
or
four
regular
expressions
that
will
get
you
all
of
this
data
and
worked
out
really
well
so
yeah.
B
B
We
just
use
this
cell
key
log
file
environment
variable.
Obviously
it's
a
lot
easier
and
then,
as
a
more
general
approach
given
given
a
memory
dump
of
either
a
process
or
a
virtual
machine,
we
will
use
all
of
the
the
regular
expressions
to
extract
those,
those
master
secrets
and
scott
dunlop
took.
You
know
some
of
andrew's
work
and
tuned
it
and
it
was
able
to
get
it
to
run
and
about
400
milliseconds
for
a
1
1
gigabyte
memory.
B
Is
what
we
did
for
the
the
standard
setup
for
the
malware
analysis
sandbox,
which
I'll
get
to
in
a
minute
and
from
there
you
know
we
had
a
Python
program
that
we
wrote
that
would
take
the
master
secrets
and
touré
s
keys
that
goes
through
and
decrypts
all
of
the
traffic
so
that
we
get
a
nice
JSON
file
that
says
application.
Data
record
contains
you
know
these
HTTP
headers
and
then
for
tour.
It
was
an
interesting.
B
The
code
is
actually
really
ugly.
So
you
know
the
tour.
The
outside
layer
is
normal,
TLS
application
data
records
when
you
decrypt
that
you
start
to
see
all
of
the
the
tour
cells.
In
this
case
it's
the
relay
data
cell,
and
so
here
you
need
to
maintain
a
list
of
all
of
the
AES
keys
and
you
need
to
know
the
proper
order
which
we
just
force
and
then
finally,
we'll
get
the
TLS
protocol,
which
again
we
need
to
decrypt
decrypt
that
with
the
TLS
master
secret
to
get.
B
B
It
turned
out
that
wasn't
the
case
and
it's
right
around
80
percent
of
the
TLS
sessions
and
that's
you
know,
80
percent
of
the
samples
and
80%
of
the
TLS
sessions.
Those
samples
and
sessions
are
relatively
close
to
each
other,
but
we
could
decrypt
the
majority
of
the
data,
so
we
were
able
to
actually
get
the
key
material
for
most
of
the
sessions
and
then
we
we
ended
up
with
you
know.
Four
main
data
sets
so
Firefox
and
Chrome
tour
and
and
malware.
B
So
the
the
Firefox,
Chrome
and
tour
data
sets
are
slightly
different
from
the
malware
data
set
in
in
the
biggest
sense
is
that
they're
all
relatively
homogeneous?
So
all
Firefox
connections
are
obviously
going
from
the
same
version
of
Firefox.
The
the
malware
data
set,
on
the
other
hand,
has
a
wide
variety
of
different
of
different
TLS
libraries
and
applications.
B
B
And
right
so
to
recap:
we
have
we
have
our
datasets,
we,
you
know
they're,
they're,
well,
labeled.
We're
able
to
associate
the
encrypted
data
features
that
we
would
see
on
the
wire
with
the
unencrypted
data
features
that
we
want
to
infer,
but
but
we
still
need
that
that
extra
step
of
like
what
actual
data
features,
are
we
gonna
feed
into
our
machine
learning
algorithm?
And
you
know,
after
some
trial
and
error,
and
not
too
much
trial
there.
B
B
Data
record
and
other
things
like
TCP
push
flags
and
and
packet
sizes
and
bytes,
but
we
would
also
take
the
set
of
data
features
from
the
previous
five
application
data
records,
and
so
this
could
be
things
that
are
in
the
actual
handshake
and
then
the
TLS
records
from
the
the
following
five
and
and
they
could
be
0
if
it's
at
the
end
of
the
session
and
that
locality
really
does
really
does
help.
And
the
other
thing
that
that
really
helped
was
the
iterative
classification.
B
So
the
the
the
assumption
here
is
that
all
of
these
headers
are
kind
of
interdependent.
So
if
we
know
if
we
know
something
about
the
HTTP
request,
that
gives
us
a
lot
of
information
about
the
HTTP
response.
So
what
we
do
is
we
make
a
single
pass
where
we
identify
the
HTTP
frame
types
and
make
an
initial
guess
at
all
of
the
header
fields
and
values,
and
then
we
refine
all
of
those
guesses
by
you
know,
including
in
the
feature
set,
the
the
guesses
from
the
previous
round.
B
B
And
the
the
other
thing
is
that
you
know
we're
not
we're
not
claiming
to
do
something
like
identifying
the
exact
value
of
a
cookie
field,
because
that's
not
it's
probably
not
exactly
exactly
possible,
at
least
with
these
methods.
So
we
set
up
our
classification
into
two
broad
sets,
so
we
have
a
set
of
multi-class
classification
problems,
and
these
are
the
I've,
never
listened
it
here.
So
things
like
the
method,
the
response,
the
the
content
type
and
the
actual
server
field
were
multi-class
classification
problems
where
we're
trying
to
identify
a
specific
value.
B
So
a
specific
you
know
JavaScript
versus
image.
In
addition
to
these,
we
had
a
set
of
binary
classification
problems,
and
here
the
the
intuition
is
that
you
know
the
fact
that
there's
a
referrer
field
present
in
an
HTTP
request
gives
us
gives
us
some
amount
of
information.
The
fact
that
there
was
a
you
know
like
a
cookie
field
yeah,
so
a
cookie
field
also
gives
you
some
amount
of
information,
and
you
can
actually
get
the
or
you
it's
difficult
to
get
the
actual
values
for
these,
but
determining
their
presence
was
significantly
easier
for
these
techniques.
B
B
So
there
was
one
experiment
where
we
divided
the
training
and
the
testing
set
by
weeks
so
one
week
of
data
to
Train,
and
then
we
took
the
second
week
of
data
to
test
and
the
other
set
of
experiments
used
an
sni
based
split,
where
no,
you
know
no
2s
and
eyes
were
in
both
the
training
and
the
test
set,
and
they
were
relatively.
Even
there
are
many
interesting
observations
like
in
this.
B
B
So
these
are
all
confusion
matrices,
you
know,
so
a
perfect
algorithm
will
get
everything
in
the
diagonal.
So
that's
that's
what
we
want
for
the
cases
of
Chrome
and
Firefox
were
able
to
do
a
pretty
good
job
of
identifying
the
the
content
types,
and
this
is
a
heavily
imbalanced
data
set.
So
you
know,
I
did
if
you've
ever
looked
at
a
lot
of
this
traffic,
which
I'm
sure
all
of
you
have
images
are
you
know?
Definitely,
you
know
or
magnitude
more
than
then
something
like
JSON.
B
Guess,
there's
two
things
so
toward
that's
fixed-length
messages
and
it
does
a
lot
of
multiplexing
the
fix,
linked
messages
and
some
of
the
the
site
experiments
that
we
did
had
relatively
little
impact
on
on
these
techniques.
The
multiplexing,
on
the
other
hand,
had
a
huge
impact,
so
multiplexing
mini
sessions
over
a
single
section.
B
Yeah
so
quick
conclusions,
the
you
know
by
far
the
most
important
part
of
all
of
these
experiments
was
building
the
the
ground
truth.
Data
sets.
So,
if
you're
willing
to
invest
a
lot
of
time
into
coming
up
with
very
varied
representative
data
set
that
can
actually
make
that
can
actually
link
the
things
that
you
care
about,
so
that
the
application
data
records
and
the
unencrypted
HTTP
headers,
actually
defining
the
machine
learning
algorithms
take
advantage
of
that
is
relatively
easy.
Like
I
just
said,
you
know
fixed
length
records
yeah.
D
All
right,
Chris
would
thank
you
for
bringing
this
research
today.
In
our
view,
I
think
it's
really
great
that
a
question
about
the
mitigations
that
you
talked
about
at
the
end,
in
particular
the
fixed-length
record
possibility,
did
you
do
any
experiments
with?
Perhaps
what
is
a
you
know
this
is,
it
all
depends
on
a
lot
of
conditions,
a
lot
of
variables.
What
is
a
good
size
for
that
fixed
length
and
whether
or
not
it
needs
to
be
applied
in
both
directions?
B
B
Created
new
data
sets
where
each
TLS
tor
session
only
included
one
TLS
connection,
so
more
or
less
like
you
would
see
from
TLS
on
the
wire
and
in
those
cases
tor
did
not
give.
You
are
anywhere
near
the
same
protections
as
the
multiplexing
case,
so
that's
where
that
intuition
comes
from
I
haven't
looked
very
much
at
very
in
that
size
of
fixed
length
records.
B
D
E
I'm,
curious
sort
of
just
from
the
last
slide,
where
you
talked
about
the
level
of
fingerprinting
yeah
being
used
to
classify
the
data,
is
that
the
trainer
and
then,
where
you
can
kind
of
you
know,
apply
the
results
and
the
emphasis
on
the
client
side
cendars
doing
like
the
bulk
of
the
sending.
Why
don't
you
think
that
that
has
a
bigger
sort
of
influence
on
on
how
that
happens?.
B
The
the
the
interesting
thing
is,
if
we
were
I,
don't
know
exactly
how
to
say
this,
but
we
could
throw
enough
I
hate.
The
explanation
like
throwing
data
at
it
so
like
the
malware
case,
is
a
really
good
example,
where
it
encompasses
a
large
number
of
clients
and
if
they're
relatively
well
represented,
then
it
can
do
a
pretty
good
job
of
identifying
it.
That's
what's
happening
on
the
server
end.
So
the
fact
that
we
have
a
relatively
you
know
distributed
set
of
servers
in
the
training
database.
B
The
the
server
specification
actually
doesn't
influence
the
model
as
much.
On
the
other
hand,
if
we
have
like
Chrome
and
Firefox,
we
trained
just
on
Chrome
it'll,
do
a
bad
job
at
classifying
on
Firefox.
If
we
do
something
like
the
malware
data
set,
where
we
get
really,
you
know
a
large
number
of
examples
from
every
possible
client
and
then
train
the
machine
learning
algorithm.
It
would
probably
have
a
form
it's
similar
to
that
of
the
malware
data
set
which,
in
our
data,
sets
a
little
worse
than
the
the
Chrome
and
Firefox
specific
data
sets.
B
But
it
does
all
right
and
you
know
I
think
too,
you
know
we
wouldn't
want.
It
would
be
difficult
to
create
a
system
that
had
a
reasonable
number
of
false
positives
if
we
both
ignore
any
kind
of
domain,
knowledge
on
the
client
side
and
the
server
side.
So
you
know
definitely
in
these
experiments.
We've
picked
to
ignore
things
about
the
server
and
just
let
the
data
take
care
of
that
I
think
doing
both
of
them.
Would
it
leads
to
slightly
reduced
results?
F
West,
fair
to
Careyes
a
so
somebody
on
the
internet
and
in
the
ATF
said
something
wrong,
so
I
naturally
wanted
to
correct
them
and
I've
been
thinking
about
collecting
a
bunch
of
data
like
this
actually
and
it
started
doing
this
last
week
and
then
to
find
out
that
you've
pretty
much
done
all
the
work
for
me.
So
I
really
appreciate
that
fantastic
work
are
you?
Did
you
mention
I,
don't
think
you
mentioned.
If
you're
releasing
any
elements
of
it,
either
data
or
models
or
tools
or
code
or
anything.
B
It
would
be
great
to
send
me
an
email
and
remind
me
there
there's
very
little-
that's
actually
sensitive
in
this
data
set
since
I
collected
it
all
on
a
virtual
machine
and
most
of
the
tools
that
we
we
rode
I
think
could
definitely
be
open
source
I.
Don't
think,
there's
that
much
that's
sensitive
with
them
I
would
need
to
talk
definitely
about
the
tools.
The
data
itself
would
probably
be
easier
to
open
source.
I.
Definitely
want
would
want
to
do
that.
Email
me
remind
me
right.
C
A
A
A
A
G
Ahead:
okay,
thanks
for
the
introduction,
so
our
paper
is
untitled
and
then
be
flexible,
high-speed
user
space,
Minard
boxes.
So
I'll
start
with
a
bit
of
context.
At
first
there
was
the
end
to
an
Internet
where
Allison
book
collections,
packets,
while
being
unsure
that
they
wouldn't
remain
untouched
in
transit.
Then
the
middle
boxes,
half
and
now
packet
exchange
by
Isis
and
Bob
are
crossing
various
types
of
middle
boxes,
from
network
address,
translators
to
various
kind
of
tunnels,
firewalls,
TCP,
accelerators
and
so
on
and
so
forth.
G
You
have
to
go
through
a
system
call
which
involves
a
context
switch
which
involves
an
extra
overhead
that
we
just
can't
afford.
Plus
it
relies
on
the
escape
of
structure
to
store
packets,
which
is
very
complex,
and
it's
not
really
in
line
with
packet,
batching
or
butter
processing.
But
fortunately
there
is
DP
DK.
G
She
can
be
understood
as
a
user
per
user
space
friendly
driver
for
the
network
controller,
which
will
write
packets
on
the
memory
regions
shared
between
the
kernel
and
the
user
space
and
allow
user
space
to
access
to
packet
directly
plus
it
relies
on
spatial
data
structures
that
are
specially
crafted
200
packet
batching.
So
this
opens
the
way
for
user
space.
Middle
boxes
and
for
adding
more
optimization
and
flexibility
to
it,
this
is
a
short
state-of-the-art
of
existing
kernel
bypass
framework.
G
Then
there
is
PF
ring
which
I'm
not
improving
the
performance
of
packet
capture,
but
she's
too
narrowed
for
us.
Then
there
is
rod
breaks,
which
is
a
first
step
in
extending
click
to
optimize
it
to
introduce
parallelism
support
to
it.
Here's
a
pocket
cheddar,
but
it
relies
on
a
GPU
which
is
out
of
scope
for
us.
Then
there
is
double
click,
fast
click
and
middle
click
adjust
which
are
successive
exchange,
extensions
of
click
which
each
introduces
new
optimization
techniques.
G
So
it
stands
for
vector
packet
processing.
It
relies
on
GP
DK,
but
it's
not
a
requirement
are
alternatives.
It
has
support
for
access,
0
copy,
forwarding
and
many
more.
It
relies
on
a
ket
vectors
paradigms.
So
it
has.
It
has
native
support
for
packet
batching,
and
it
has
a
node
based
approach
similar
to
click
and
additionally,
it
has.
It
gives
special
attention
to
low
level
optimizations
such
as
catching
and
pipelining.
G
So
this
is
an
example
of
out
of
the
book
example
of
a
VPP
guru,
so
it
processes
packet,
0,
&
1
in
one
iteration
of
the
loop,
but
it
start
by
prefetching
packet,
2
entry.
So
not
only
we
subtract
the
processing
time
of
packet,
0
n
1,
the
memory
access
time
of
packet
to
entry,
but
given
the
packet
vector
paradigm,
the
node
based
architecture
and
the
nature
of
the
CPU
cache,
it
will
in
fact
amortize
the
cost
of
memory
access
to
the
entire
packet
vector
and
now
in
that
memory.
G
Access
is
a
major
bottleneck
in
software,
which
is
a
major
improvement.
Let's
process
it
packet
to
packets
at
a
time-
and
this
is
done
to
increase
to
leverage
more
hardware
pipelining,
which
is
basically
anticipating
the
nest
in
the
next
instructions
and
unpredictable
Jones,
makes
pipelining
lose
a
few
clock
cycles.
So
by
explicitly
unrolling
the
loop
you
can
avoid
losing
truck
cycles
needlessly
and
the
last
loop
is
simply
processing
the
remain
packets.
G
G
It
should
be
fast
even
with
thousands,
tens
of
thousands
hundred
thousand
rules,
and
it
should
have
possible
intuitive
July.
So
this
is
an
overview
of
the
brahmer
for
rules
addition,
you
can
add
stateless
on
stateful
rules.
You
can
match
any
combination
of
fields
and
you
can
apply
any
combination
of
actions
from
modification
addition
stripping.
G
So
this
is
an
overview
of
the
processing
path
of
the
plugin,
so
we
fetches
a
packet
vector
test
each
packets
against
each
table.
This
one
table
mask
which
contains
all
keys
for
the
rules
of
using
this
mask.
Then
we
test
them
against
the
more
complex
matching
rules,
for
example,
ECP
options
which
cannot
be
done
using
masks.
And
finally,
we
test
the
packets
against
the
connection
table
and
if
needed,
we
have
the
states
of
the
flow
and
when
needed,
like
the
packet
vector,
is
forwarded
to
write
to
the
right
node.
G
So
we
compare
the
performance
of
nmv
to
three
state-of-the-art
solutions.
Fastclick
already
introduced
sending
extension
of
click
designed
to
be
fast.
Then
there
is
Express
data
pass
XDP,
which
is
an
internal
solution
that
relies
on
extended
BPF,
which
allows
to
do
packet
classification
in
the
kernel,
and
it
also
adds
support
for
flow
tracking
tables
hashmaps.
And
it's
supposed
to
be
fast
and
IP
table
for
completeness.
G
G
Then
we
evaluated
the
performances
of
five
triple
firewall
filtering
by
injecting
stateless
rules
matching
on
randomly
generated
five
topples
and
measuring
the
throughput.
You
find
that
MMB
and
XDP
sustains
the
direct
baseline.
Surprisingly,
fast
click
doesn't
because
our
performance
issue
in
one
of
the
component
and
surprisingly
IP
tables
on
kernel
15
sustains,
but
to
sustain
the
direct
baseline
for
up
to
1k
rules.
G
Then
we
evaluated
a
scenario
of
stateful
flow
matching
well
inject.
Similarly,
stateful
rules
matching
on
randomly
generated
factor
per,
but
here
we
make
sure
that
each
flow
matches
at
least
one
rule
so
that
the
states
of
each
ropes
maintain
the
results
are
pretty
similar.
We
find
that
mm
v
and
x
DP
sustains
that
our
baseline
fast
leak.
This
time
sustains
85%
of
the
dark,
baseline
and
IP
tables
behave
similarly
to
the
stateless
scenario.
G
It
is
not
applicable
to
IP
tables
nor
fast
click,
because
fastly
doesn't
support
it
and
we
couldn't
manage
to
push
it
into
x,
DP
in
a
modular
enough,
where
this
is
probably
possible,
but
not
comparable
to
MMB.
So
we
only
evaluated
MMB
here
and
we
found
that
it
is
stable
up
to
with
up
to
seventy
eight.
Seventy
eight
different
TCP
options,
which
drastically
is
very
enough.
G
So
the
confusion
we
evaluated
MMB
found
that
it
was
able
to
sustain
line
rate
for
different
use
cases
of
middle
box
policies
in
the
future,
who
would
like
to
add
payload
work
instruction
to
it
to
be
able
to
implement
policies
for
our
PSS,
idss
or
other
things.
If
you
are
interested
or
just
curious,
just
check
out
the
repository.
C
Come
Robert,
so
some
of
this
actually
ties
in
with
that
presentation
we're
doing
tomorrow
on
deep
dive
into
Nicks.
We're
definitely
going
to
have
smart
Nick's
coming
that
can
do
a
lot
of
this
processing
inside
the
neck,
so
I
think
a
direction
for
something
like
this
might
be
how
how
to
leverage
some
of
those
advanced
features,
because
we
know
once
you
start
pushing
stuff
down
to
the
neck
like
the
fire.
Rover
rules
they'll
perform
better,
so
something
to
consider
mm-hmm.
Thank
you.
A
A
So
zhushan
the
first-year
student,
a
PhD
in
CMU's,
institute
of
software
engineering,
research
and
working
on
programming,
languages,
assistance,
research
and
also
does
some
work
for
Comcast
around
edge
computing
and
programmable
networks
and
claims
he's
not
on
the
job
market
so
approaching.
Please.
H
Hear
me
all
right
cool,
so
this
is
some
interesting
work
that
we've
been
looking
at
I've
been
looking
at
actually
from
two
kind
of
ways:
kind
of
inspired
by
work,
we're
doing
at
Comcast
around
edge
computing
and
and
it's
good.
The
last
talk
was
before
me,
so
remember
all
those
things
about
middleboxes
and
network
functions
and
about
things
like
VPP.
We
use
something
different
I'll
talk
about
that,
but
we
were
doing
all
this
work
with
ipv6
segment,
routing
variable
data
and
that
kind
of
changed
the
way
we
thought
about
things.
H
H
Everyone
is
interested
in
I
of
higher-level
languages.
I
love
functional
programming.
These
were
not
the
typical
things.
I
saw
when
I
started,
writing
Network
functions,
but
then
there
there
actually
is
a
lot
out
there.
So
there
are
things
like
PI
retic,
which
comes
out
of
the
frenetic
kind
of
lineage,
which
is
a
python-based
library
for
writing.
Modular
network
functions,
there's
stuff,
like
slick
that
came
out
that
lets
you
do
interesting
kind
of
pipelining
for
subsets
of
traffic
and
then
there's
things
like
netcat,
which
is
this
really
cool
semantic
model
for
thinking
about
networks.
H
It
has
a
proofs
for
soundness
and
completion.
It
seems
really
really
great
it
lets
you
do
things
like
reachability
analysis.
Writing
your
code
for
all
these
things
that
are
really
great
things
are
getting
more
and
more
complex,
and
even
though
you
have
these
great,
this
great
set
of
research,
that's
coming
out
showing
us
how
we
can
do
certain
things.
It
doesn't
really
cover
a
lot
of
ad
hoc,
and
you
know
kind
of
large
scale
ideas,
because
now
writing
network
functions
is,
is
dealing
with
complex
routing
and
load
balancing
policies.
H
So
my
work
at
Comcast
is
the
kind
of
inspiration
and
inputs.
It's
for
us
like
what
we
looked
at
from
their
traffic
monitoring
is
a
major
thing.
You
have
a
lot
around
experimental
new
specifications,
protocols
and
headers,
and
then
you
have
a
lot
of
work.
There's
recent
I
think
2018
or
17.
This
paper
called
in
network
computation,
is
a
dumb.
H
Time
has
come,
it's
a
great
title
talks
about
how
we're
doing
aggregation
and
doing
all
this
different
kinds
of
views
on
computation
in
the
network.
So
things
are
getting
really
complex.
Can
these
frameworks
that
we
have
to
write
high
high
level
code
and
network
functions
and
packet
processing
is
that
we
get
going
in
the
right
direction,
so
the
motivation
for
us
was.
It
sounds
a
little
cheeky
because
I
come
from
web
programming
and
application
programming
was
if
I
program
and
react.
Who
here
knows
react
the
JavaScript
framework,
yeah
I'm?
H
Definitely
at
a
networking
conference,
but
yeah
reactive,
this
major
kind
of
JavaScript
UI
framework.
That's
gotten
a
lot
of
popularity,
because
it's
a
really
great
set
of
engineering
work
does
really
smart
work
about
doing
differentiation
and
moving
toward
immutability
and
how
we
write
programs.
But
if
I'm
writing,
if
I'm
a
react
programmer,
how
do
I
write
a
network
function
if
I
don't
know
much
about
the
lower
layers
in
packet
level,
computation,
that's
available
and
so
say.
I
start
and
I
have
a
way
to
do
that
in
these
high-level
frameworks.
H
Maybe
in
like
a
language
like
Python,
you
saw
how
do
we
know
what
we're
doing
is
right,
I,
don't
know
much
about
this
is
not
my
background.
I,
don't
read
IETF
specs
all
the
time.
How
do
I
do
this
and
then
how
can
we
iterate
upon
this
and
debug
and
kind
of
learn
as
we're
writing,
and
so
this
might
be
scary
for
some,
but
maybe
it's
okay.
If
all
those
react
and
JavaScript
programmers
start
coming
over
to
write
Network
code,
maybe
the
problem
that
is
out
there
with
what's
available
in
these
higher
level
abstractions.
H
H
Baseness
a
little
bit
off
the
original
work
they
they
compile
down
to
two
code
that
had
to
work
on
the
OpenFlow
protocol.
Again,
that's
great,
but
there's
a
lot
of
other
protocols,
of
course,
and
even
the
original
impetus
of
open
flow
was
like
on
campus
research
and
experimentation.
And
yet,
if
you
do,
if
a
new
protocol
comes
in,
our
experimental
protocol
is
available,
open
flow
might
not
support
it,
and
a
lot
of
the
case
that
I
dealt
with
and
coming
up
with.
H
This
research
was
dealing
with
arbitrary
I,
add
hot
logic
in
variable
link
data
extension,
headers
extension,
headers,
extension
headers.
So
we
do
a
lot
of
work
at
Comcast,
around
segment,
routing
and
other
than
there
was
some
library
that
had
implemented
versions
of
the
v6
segment
routing
header.
There
was
not
much.
We
had
to
write
a
lot
of
this
ourselves
and
then
just
imagine
when
you
have
multiple
extension
headers
on
top
of
each
other,
we're
not
just
hot.
H
You
know
it
can
get
pretty
complex
and
so
dealing
with
how,
if
I
update
segments
on
the
fly
in
the
middle
of
the
network
over
and
over
again,
how
do
I
change
the
packet
lengths
all
the
time.
How
do
I
deal
with
dynamic
information
around
failure
and
we've
configuration
so
the
you
know,
separately
from
the
research
kind
of
part
of
this.
My
work
at
Comcast
is
dealing
with.
How
do
we
make
things
like
failure
and
load
shedding
a
primitive
for
how
we
write
programs?
H
Anybody
can
write
programs
thinking
about
these
things
and
policies
at
application
level.
So
the
last
talk
mentioned
a
little
bit
about
click-click
is
kind
of
the
fact
they're,
like
really
awesome
in
99
yeah
way
to
write
code
for
doing
packet,
processing
pipelines-
and
you
know
it's
still
used
and
and
and
I.
You
know,
Eddie's
awesome
for
a
lot
of
things,
but
this
is.
This
is
a
piece
of
code.
That's
still
in
there
to
do.
Ipv6
validation,
with
this
kind
of
go
too
bad
sink
that
a
lot
of
other
little
pieces
of
code.
H
If
things
are
not
right-
and
you
see
that
we
have
this
kind
of
hard-coded
value
in
our
paper,
if
you
download
it
there's
even
code
a
more
complex
set
of
code,
maybe
from
Facebook
on
there
and
network
load,
balancer
called
cat
ran,
which
just
came
out.
You
know
open
source,
I,
guess
officially
not
too
long
ago,
and
they
had
a
lot
of
code.
It's
seemingly
really
good
results,
but
they,
you
know,
everything
is
based
on
constants
hard-coded
numbers
I
mean.
Maybe
this
is
pretty
common
in
networking
man.
H
It
is
not
common
when
you
think
of
higher
level
programming
abstractions.
The
two
examples
we
look
at
in
the
in
the
paper,
which
are
kind
of
these
variable
ad
hoc
ones,
the
MTU
sent
to
big
response.
So
you
have
a
client
that
has
like
a
TCP
packet.
It's
way
too
big.
We
have
to
do
these
sets
of
actions
to
then
return
it
back,
including
changing
the
protocol
to
an
IC
icmpv6
and
the
v6
variation,
calculating
checksum,
swapping
source
and
destination.
H
How
do
we
make
sure
that
happens,
especially
if
I'm
writing
this
network
function
from
the
beginning?
The
other
one
as
I
mentioned
is
ipv6
extension
headers,
which,
if
you
look
at
the
spec
right
there,
you
have
things
like.
Obviously
the
segment
lists
can
get
can
change
over
time,
and
you
have
this
TLV
type
value
objects
which
can
change.
You
know
which
can
change
and
again
it's
variable
information.
A
earlier
talk
is
about
NTP.
We
do
a
lot
of
ntp,
which
has
these
prefix
options.
You
can
sometimes
do
get
options
on
this
packet.
H
Sometimes
you
won't
we've
seen
this
in
practice
and-
and
that
makes
things
very
difficult,
so
I
want
to
wait.
We
want
to
wait
to
think
about
this
from
an
abstraction
level,
and
so,
as
you
see
there
v6,
and
so
we
thought
about
what
is
an
interesting
way
to
combine
a
hybrid
set
of
checks,
or
we
call
contracts
and
in
in
software
engineering
and
programming
language
land
and
that
the
impetus
kind
of
came
from
early
things.
H
The
Eifel
programming
language,
which
was
in
the
86,
talked
about
this
idea
of
designed
by
contract,
where
your
focus
on
how
run
some
countries
have
you
turned
on
for
monitoring
and
testing
situations
and
quoted
from
I
think
the
original
paper
a
book
you
can
just
sit
back
and
watch
your
contract
to
be
violated
again.
This
is
during
the
design
and
testing
phase
of
writing
programs,
and
this
is
built
into
the
language
contractor,
primitive
and
iPhone,
and
now
you're,
seeing
this
a
lot
of
other
languages.
H
We
also
wanted
to
add
some
sort
of
static
checking
that
is
based
on
compile
time
assertion
so
in
languages
like
C,
C++
and
D
you'll
see
this
used
and
you
can
do
checks
on
constant
statics,
which
again
happy'
get
happen
and
Static
having
at
compile
time,
and
these
can
actually
remain
and
release
binaries
because
they
have
their
only
affect
is
its.
That
is
the
static
programming,
and
then
we
have
this
idea
of
static
order,
preserving
headers.
So
in
the
framework
we
use
I'm
an
exercise
on
the
implementation.
It's
that
one
called
net
bricks.
H
It
has
this
concept
of
the
previous
header
as
a
type,
so
we
use
this
kind
of
previous
set
of
headers
to
determine
order.
So
if
you're
traversing
a
packet
in
your
network
function
code-
and
you
use
something-
that's
you
know
TCP
coming
after
some
I,
not
coming
after
v6
or
v4
IP
part
of
your
packet.
That
will
be
a
compile
time
error.
So
we
use
that
to
our
advantage
in
doing
our
code.
So
again,
the
design
by
contract,
as
I
mentioned,
influenced
by
Tony
Hoare,
and
this
workaround
logic
around
pre
and
post
conditions.
H
That's
depends
on
how
many
bytes
are
actually
look
and
again
static
for
the
static
order,
preserving
headers
the
idea
of
Sachi
to
find
an
order
mechanism
I'll,
show
an
example
of
what
that
means.
So
our
implementation
is
done
as
a
gradual
extension
to
this
through
Netflix,
which
is
a
DP
DK
based.
You
know,
user
space
that
we
just
heard
about
take
on
this.
This
work
and
it's
written
in
rust,
which
is
a
really
cool
language
that
is
now
starting
to
get
you
know
even
Congress,
knows
about
rust.
Knight
leaves,
if
you
saw
a
recent.
C
H
And,
and
in
particular,
Netflix
is
focusing
idea
of
zero
copy
soft
isolation.
It's
able
to
do
it
be
able
to
like
eliminate
the
need
for
copying
packets,
recapping
Eureka.
You
have
copying
packets
because
of
this
idea
of
using
static
types
to
do
these
checks
and
using
this
idea
of
order.
We
just
leverage
it
and
go
a
little
bit
further
and
we
implement
our
work,
our
prototype
as
a
small
rust
library
which
generates
code
for
validations
and
assertion.
H
So
the
key
here
this
comes
more
from
some
previous
software
engineering,
research
and
and
and
programming
language
work
is
that
we
didn't
want
the
programmer
to
have
to
write
all
these
all
these
validations
and
worry
about
how
all
these
validations
work
together.
We
let
them
write
very
simple
checks
and
we
generate
the
code
to
do
the
work
using
macros.
H
Another
talk
for
another
time
about
how
great
macros
are
it's
my
favorite
thing
in
the
world
so
and
and
rust
has
hygienic
procedural
macros
which
are
really
cool
for
a
low-level,
know:
TC
language,
so
here's
a
net
Brix
a
typical
example:
I
she's
from
Newark
Cove
we've
been
writing
and
1/4
of
metrics,
which
is
open
source
under
and
github
and
I'll
talk
yeah.
You
can
find
that
in
the
paper.
H
This
is
a
simple
Mac
swap
where
we're
checking
just
doing
a
simple
parse
here,
and
you
see
that
we
have
these
types
like
Ethernet
that
we
can
determine,
and
the
code
looks
like
a
declarative
MapReduce
if
you're
familiar
with
spark
or
any
of
these
frameworks.
That's
the
thing
that
Netflix
has,
but
it's
very
ad
hoc
I
can
do
terrible
things.
I
can
try
to
log
a
blot.
Do
a
log!
That's
a
blocking
process,
as
my
packets
are
being
processed,
I
have
to
know
what
I'm
doing
and
think
about
it.
H
So
there's
a
price
you
pay,
so
our
work
in
particular
see
if
this
works
there
we
go
yeah
a
little
bit
delayed,
but
alright,
okay,
alright,
so
we
we
missed.
The
word
gradual,
so
we
have
these
network
functions.
They
look
like
this
I've
alighted
some
some
work
of
filters,
maps
and
group
bys
that
you
can
do
in
this.
But
what's
cool
about
our
work
is
that
you
can
add
this
check
attribute.
We
will
check
this.
We
will
be
able
to
check
that
network
function
if
you're
composing
these
functions.
H
For
example,
if
you
don't
put
the
check,
we
won't
check
it
again.
It's
gradual
it's
buy-in
and
we
can
take
old
metric
functions,
for
example,
and
just
add
our
code
on
to
it,
and
if
you
don't
want
it,
you
don't
get
it
and
so
in
our
in
our
piece.
We
have
our
preconditions
here
which
are
using
a
macro
called
ingress
check
where
we
can
determine.
H
We
actually
saying
here's
the
order
we
expect,
which
again
it's
going
to
be
static,
time,
runtime
checks
and
then
a
post
condition
which
checks
what
the
packet
was
before
to
what
it
is
at
the
end
and
so
again,
orders
checked
statically
via
tracer
package
contents.
We
have
pre
checks
that
validate
incoming
content
and
store
content
at
runtime,
and
we
have
post
checks
that
validate
the
transform
packets
correct
in
this
case,
when
I
have
an
MTU,
send
too
big
I
have
a
TCP
packet,
for
example.
H
That's
too
large
my
return
packet
now
should
be
an
ICMP
packet.
That
is,
that
is
the
right
amount
of
bytes
1280
and
in
most
cases
that
is
the
right
amount
of
bytes
and
I
have
swapped
destination
and
source
addresses.
I
have
swapped
Ethernet
addresses
and
basically,
if
this
works
again
at
design
time
in
a
run
time,
so
our
evaluation
was
pretty
much
looking
at
additional
syntax
compilation,
time
and
runtime
overhead
again.
The
focus
has
been
on
the
design
phase.
So
we
look
at
syntax
added.
H
This
is
a
very
common
software
engineering
thing
you
see
in
papers
emitting
Li,
I
will
say
I
kind
of
hate
it,
but
I
did
it,
and
so
we
see
that
we
don't.
We
don't
really
add
many
lines
of
code,
most
of
it
really
around
configuration
for
adding
some
new
libraries,
but
we,
you
know
adding
these
tracks
is
basically
end
to
the
amount
of
checks
that
you
want
to
add
for
assertions
this
one,
we
feel
is
pretty
important,
which
is
that
we
have
this
tool.
H
We,
the
goal
is
designed
by
contract
combining
static
and
dynamic
checks,
but
compilation
time
should
not
be
affected.
The
same
Russ
program,
the
same
metrics
function.
You
should
be
writing
without
any
of
these
generated
code
should
the
same
as
you
know
with
or
without
so
we
show
that
actually,
in
some
cases,
no
significance
we're
actually
faster.
It
means
it
just
doesn't
matter
and
there
is
a
runtime
cost.
So
we're
doing
we're
doing
some
work
to
do
like
order,
preserving
headers
we're
actually
tracing
the
entire
path.
H
We're
tracing
the
entire
transformation
of
the
packet
header
by
header
to
make
sure
that
we
in
the
code
that
we
generated
to
make
sure
that
the
order
is
correct,
for
example,
and
we
do
a
lot
of
runtime
storing
for
the
dynamic
check.
So
as
a
packet
comes
in,
we
store
all
this
information,
so
we
can
use
it
for
validation
at
the
post
condition.
That
does
take
time.
So
we
have
these
runtime
checks,
also
we're
not
using
the
most
optimized
data
structure.
H
But
again
this
is
for
design
phase,
and
ideally
you
wouldn't
be
running
this
in
production.
You
would
only
have
the
static
assertions
in
production,
so
our
future
work
we've
already
started
on
some
of
this
is
that
is
not
just
to
run
this
kind
of
on
your
computer,
while
you're,
while
you're
designing
programs
and
network
functions.
You
can
do
this
as
part
of
a
deployment
model
and
CI,
or
already
we
have
this
working
running
up
in
mini
net
and
container
net,
which
we
use
heavily
to
run
this
up.
H
You
can
run
these
as
part
of
your
simulation.
We
hope
to
further
leverage
static
analysis
have
been
put
programs
and
we
really
want
to
bring
in
a
lot
of
the
work
that
we
see
in
compilation
now
and
even
from
UI,
tooling,
around
interactive
feedback
program,
slicing
and
refinement
using
program,
affine
and
constraint.
H
Solving
so
there's
a
lot
of
work
in
p4,
which
is
used
to
write
a
lot
of
network
functions
to
look
at
constraint,
solvers
for
work,
but
they
tend
not
to
go
for
these
very
ad
hoc
variable,
linked
data
examples
and
we
hope
to
actually
show
this
in
practice.
At
Comcast
we
use
this
net
Brix
library
and
we've
forked
it
and
created
a
kind
of
our
own
version.
We've
rewritten
a
lot
of
the
runtime.
H
We
do
use
some
of
the
checking
work
that
actually
I've
done
in
my
research,
but
we
also
had
to
come
up
with
some
new
ideas
that
that
kind
of
emphasized
what
does
it
mean
to
program
these
network
shion's
for
anybody,
and
we
have
to
create
limitations.
So
one
of
the
limitations.
We
have
to
do
this
idea
of
scope,
side
effects,
so
I
mentioned
earlier
in
the
MTU
example.
H
We
have
to
update
all
these
things
when
you,
when
you
change
from
your
change,
to
go
back
to
the
sender
with
the
specific
response
we
have
to
basically
have
this
cascade
function,
which
says
any
time
that
I
update
parts
of
the
packet
automatically
cascade
and
change
the
packet
length
and
recalculate
the
checksum
the
same
thing
for
segments
so
the
work
we
do
in
Comcast
we're
constantly
changing
semantics
on
the
segment's.
We
actually
use
the
bits
in
the
address
100.
H
The
128-bit
addresses
to
actually
encode
actions
takes
an
idea
from
old
light,
an
OL
idea
called
active
networks,
and
so
what
we
need
to
do
there
and
say
well,
when
I
set
the
segment's,
make
sure
I
change
the
packet
lengths
update
the
checksum,
etc,
etc.
Let's
not
let
the
programmer
have
to
actually
do
that.
Let's
force
them
to
do
that.
It's
a
limitation
upfront,
giving
away
some
like
some
interesting
flexibility,
but
an
interesting
thing.
H
We
also
have
this
concept
of
tight
packets,
and
not
only
do
we
have
the
previous
header
and
the
the
header
that
comes
after
us,
where
you
guys
say
what's
the
envelope,
so
we
can
bound
things
using
our
type
system
to
say
well,
a
tcp
or
UDP
UDP
part
of
a
packet.
You
know
can't
come
after
some
ICMP
one
and
we
can
now
do
that
using
bounding
and
types
and
again
that's
statically
checked
so
takeaways
are.
We
need
better
approaches
to
verify,
interact
with
network
functions
and
processing
packet
processing
program
properties.
H
In
this
work
we
have
a
hybrid
approach
that
we
use
to
gradually
check
and
validate
arbitrary
large
logic
and
side-effects,
and
by
combining
these
different
kinds
of
contracts,
we
show
that
a
mixed
model
can
really
work,
and
we
did
this
all
about
penalizing
the
developer
and
programmers
at
the
design.
Time
of
writing.
These
network
functions
thanks.
Buddy.
I
H
I
H
Of
it,
yeah
yeah
for
sure
yeah,
so
yeah
I
mean
I,
I.
Hope
to
see
more
of
this,
you
know.
Most
of
my
background
came
from
looking
at
stuff
like
netcat,
which
you
won't
necessarily
see
here.
You'll
see
it
pop
I
was
so
surprised
when
we
saw
I
started
working
in
networks
and
going
not
everybody's
using
netcat,
and
everyone
laughed
at
me
and
the
work
is
so
amazing,
but
the
idea
is
I
think,
obviously
for
actually
use.
You
have
to
find
some
sort
of
balance
and
that's
what
we're
trying
to
convey.
Thank
you.
Anyone.