►
From YouTube: IETF92-IRTFOPEN-20150324-1730
Description
IRTFOPEN meeting session at IETF92
2015/03/24 1730
A
C
If
you're
here
for
the
I
RTF
open
meeting
you're
in
the
right
room,
if
you're
here
for
something
else
or
you
want
to
read
you
email
you're,
welcome
to
do
that.
If
you
give
her
something
else,
you're,
probably
in
the
wrong
room,
the
observant
amongst
you
will
notice
that
I
am
NOT.
Lars
eggert
I
am
Matt
Ford
I'm
with
the
Internet
Society,
and
mostly
here,
to
introduce
our
speaker
for
this
session.
C
Who
is
our
applied
networking
research
Prize
winner
at
this
I
ETF
and
that
is
Aaron
gambar
Jacobson
who's
going
to
talk
about
who
won
the
award
for
designing
and
evaluating
nfe
control?
Plane
he's
going
to
tell
you
a
lot
more
about
that,
but
maybe
we
could
just
have
a
round
of
applause
to
congratulate
Aaron
on
winning
his
AARP
award.
C
I
think
Aaron
you're
gonna
you're
going
to
present
and
then
we'll
take
some
often
time
afterwards
for
Q&A.
If
you
want
to
save
up
your
questions
and
I
guess,
if
you
have
clarifying
questions
you
can
you
can
dive
in
with
those,
but
otherwise
we'll
save,
save
questions
for
after
errands
talk
and
I'll
moderate
the
discussion,
Thanks
yep.
B
Something
is
not
clear,
certainly
feel
free
to
step
up
to
the
mic
can
interrupt
me.
So
thanks.
So
much
for
that
introduction.
I
hope
you'll
find
what
I'm
talking
about
today
interesting.
So
what
we've
done
is
we've
we've
done
some
research
to
take
the
principles
that
we
have
in
software-defined
networking
and
extend
those
principles
to
network
functions
or
middle
boxes
that
are
running
in
our
network
in
order
to
allow
operators
of
networks
to
better
satisfy
a
number
of
different
goals.
B
So
for
those
of
you
who
aren't
familiar
with
network
functions
or
middleboxes,
the
basic
idea
behind
them
is
that
they're
going
to
perform
some
sort
of
sophisticated
analysis
of
traffic
or
flows
as
it
passes
through
this
device
in
the
network,
and
typically
it's
going
to
take
some
stateful
actions
on
that
traffic,
so
good
examples
that
commonly
exists.
Things
like
when
optimizers
caching
proxies
intrusion
prevention
systems
and
we're
seeing
two
ships
in
the
way
these
network
functions
are
being
deployed
today.
B
The
first
of
these
is
network
functions
virtualization,
and
the
basic
idea
behind
this
is
that
we
want
to
take
dedicated
Hardware
appliances
that
are
deployed
today
and
replace
them
with
virtual
machines
that
are
providing
the
same
functionality,
but
allows
us
to
run
the
network
functions
on
top
of
generic
compute
resources,
so
we
no
longer
need
customized
hardware.
The
benefit
of
this
is
that
we
can
dynamically
allocate
instances
of
network
functions
as
we
need
more
capacity
in
our
network
or,
as
we
need
to
introduce
new
functionality.
B
The
other
trend,
that's
reshaping.
The
way
network
functions
are
deployed
is
software-defined.
Networking
software
defined
networking
gives
us
the
ability
to
flexibly
re
route
traffic
between
these
network
functions
as
we
create
them
or
as
the
needs
in
our
network
of
all,
and
so
together.
What
these
two
trends
give
us
is:
they
give
us
a
way
to
dynamically,
reallocate
we're
in
our
network
reprocessing
certain
traffic
and
what
processing
is
happening
to
that
traffic
and
as
a
result,
that
can
enable
a
variety
of
interesting
service,
abstractions
and
capabilities
for
our
middleboxes.
B
So
one
such
example
is
we
could
build
a
system
that
elastically
scales
network
functions
as
the
demand
and
our
network
changes
over
time.
So
we
start
off
here
with
a
single
instance
of
an
intrusion
detection
system,
and
we
want
to
make
sure
that
this
intrusion
detection
system
is
going
to
always
be
satisfying
some
sort
of
performance
SLE.
Perhaps
we
have
an
SLA
that
says
the
packet
loss
that
we
experience
has
to
be
less
than
some
percentage,
so
as
the
load
in
our
network
increases
will
start
to
overload
this
initial
instance.
B
We
have
that's
going
to
start
to
create
SLA
problems,
and
so
we
need
to
add
another
instance
which
NFV
makes
it
easy
to
do
this
and
with
Sdn,
then
we
can
rear
out
some
of
the
traffic
from
our
original
instance
to
this
second
instance,
and
now
that
gives
us
the
ability
to
shed
load
from
that
original
instance
and
now
satisfy
our
SLA
okay.
So
now.
B
B
The
problem
here
is
that,
while
we're
doing
this
scaling
in
and
scaling
out,
it's
important
that
we
accurately
monitor
the
traffic
and
have
our
IDs
function
as
we
expect
it
to
to
actually
detect
malicious
attacks
on
our
network.
The
thing
is,
it
turns
out
in
order
to
do
all
three
of
these
together,
we
actually
need
more
than
what
we
can
just
get
with
this
concept
of
NFP
and
this
concept
of
SDN,
and
so
with
only
these
two
abstractions.
B
Today
we
can't
quite
realize
these
scenarios,
like
elastic
NF,
scaling
or
some
sort
of
high
availability
situation,
so
to
understand
a
bit
more,
exactly
what
we're
missing
and
what
else
we
need.
Let's
take
a
look
at
this
scenario
in
a
bit
more
depth.
So
again
we're
going
to
assume
that
we
start
off
with
a
single
instance
of
the
ids,
and
here
I'm
going
to
look
at
traffic
at
a
little
bit
finer
granularity,
I'm
going
to
assume
that
we
know
specific
flows.
B
These
could
be
TCP
flows,
it
could
be
a
set
of
traffic
from
a
group
of
house,
but
some
notion
of
flow
through
this
network.
So,
as
we
see
traffic
from
these
flows,
this
intrusion
detection
system
is
going
to
establish
some
state
related
to
those
could
be
things
about
connection
endpoints
potential
information
about
what
we've
seen
in
the
payloads
so
far
a
variety
of
different
pieces
of
information.
B
So
one
option
is
that
we
could
only
rear
out
new
flows
that
are
coming
into
our
network,
such
that
if
we
have
some
green
flow
that
comes
in
we'll
send
it
to
this
second
IDs
instance
that
we
just
created
it'll
establish
some
state
and
properly
analyze
this
traffic,
and
this
is
great
from
a
cost
perspective.
We
clearly
needed
this
extra
instance,
but
this
isn't
going
to
help
us
satisfy
our
SLA.
B
We
still
have
all
that
extra
traffic
from
the
red
and
the
blue
flow
going
through
our
first
instance
we're
still
starting
to
experience
packet
loss,
so
this
isn't
going
to
work.
The
other
challenge
that
we
face
is
that
there
could
be
information
at
each
of
these
IDS's
that
we
need
to
collectively
combine
in
some
way,
so
maybe
we're
trying
to
do
port
scan
detection.
All
these
flows
are
going
to
a
particular
hosts,
and
if
we
don't
aggregate
information
about
connection
counts
between
both
instances,
it's
going
to
take
us
longer
to
detect
that
scan.
B
So
it's
unclear
of
accuracy
will
be
affected
in
this
situation.
Also,
okay,
so
we
need
to
get
some
traffic
off
of
this
original
instance.
So
we'll
pick
one
of
the
floats,
let's
say
the
blue
flow
and
go
ahead
and
rewrite
it
now.
The
problem
is
that
well,
we've
rerouted
this
flow
we've
run
into
a
situation
where
we
left
at
state
behind
and
so
now
the
state
that
we
need
to
continue
to
analyze
this
traffic
and
detect
any
attacks
that
might
be
in.
B
It
is
now
only
available
at
our
old
instance
and
not
available
the
new
place
where
this
traffic
is
going
to
so
we're
not
going
to
reach
our
accuracy
goal
at
some
point.
Eventually,
this
blue
flow
will
die
Oh
to
the
network.
The
load
in
our
network
will
go
back
down
and
so
from
a
cost
perspective,
we
want
to
ideally
be
able
to
destroy
the
second
instance.
The
problem
is:
when
do
we
go
about
doing
that?
If
we
destroy
it
immediately,
we
run
into
the
same
problem
where
we
get
rid
of
state
that
we
need
it.
B
We
no
longer
we'll
be
able
to
properly
analyze
the
green
flow
if,
instead,
we
wait
for
this
green
flow
to
die
off
now
we're
going
to
run
into
a
situation
where
we
need
to
wait
for
a
potentially
unbounded
amount
of
time
before
we
can
destroy
this.
So
in
traffic
traces
we've
looked
at
from
our
campus
network,
this
may
mean
for
25
minutes
we're
going
to
continue
to
run
this
instance,
maybe
longer
so
that
means
we're
going
to
satisfy
our
SLA
is
and
accuracy,
but
from
a
cost
perspective,
we're
spending
a
lot
of
extra
money.
B
We
don't
need
to
so
what
exactly
do
we
need
that
if
we
want
to
get
these
three
goals,
what's
missing
from
just
an
MP
and
sdn?
Well,
one
thing
is
that
we
need
some
way
to
manage
the
internal
state
that
these
network
functions
are
maintaining,
and
so
we
need
to
be
able
to
move
it
copy
it
and,
in
some
cases
share
it
between
different
instances
of
a
network
function.
Second
of
all,
as
we're
transferring
this
state
around,
we
want
to
make
sure
that
we're
not
compromising
the
accuracy
of
our
network
function.
B
So
there
are
certain
guarantees
we
need
to
have
on
how
the
state
transfer
is
happening,
such
that
we
don't
lose
updates
to
this
state.
We
don't
potentially
have
packets
that
are
processed,
and
maybe
even
in
some
cases
we
need
to
make
sure
we
process
the
packets
in
a
particular
order.
Now
these
same
requirements
apply
not
only
to
the
elastic
scaling
scenario
that
I
talked
about,
but
to
other
interesting
scenarios
like
transparent,
failover
or
potentially.
B
If
we
want
to
do
something
like
in-place
upgrades,
so
I
hope,
I've
convinced
you
that
we
need
something
new
here
and
some
for
the
rest
of
the
talk.
I
might
talk
about
what
are
the
challenges
in
doing
this
and
meeting
those
requirements?
I
just
talked
about
I'll
talk
about
then
our
architecture
that
we've
developed
in
order
to
meet
those
requirements
and
address
those
challenges,
and,
lastly,
I'll
close
with
some
preliminary
evaluation
results.
B
The
first
of
these
is
that
there's
a
lot
of
different
network
functions
out
there,
everything
from
when
optimizers
de
cashing
proxies
to
when
you
start
to
talk
about
cellular
networks,
things
in
the
evolved
packet
core,
and
we
want
to
make
sure
that
we're
minimizing
the
number
of
changes
we
need
to
make
to
these
and
that
we
can
accommodate
a
lot
of
different
network
function
architectures
within
this
broader
system
that
we're
proposing
to
develop.
The
second
issue
is
not
there's
lots
of
things
going
on
in
the
network
here.
We're
thinking
about
moving
state.
B
There's
updates
that
are
happening
to
that
state,
there's
packets
that
are
still
flowing
through
our
network
and
we
want
to
be
making
forwarding
updates.
So
how
do
we
avoid
problematic
race
conditions
between
all
of
these
different
things
that
are
going
on?
Lastly,
it's
important
that
whatever
we're
doing
to
move
state
around
doesn't
have
a
lot
of
memory,
overhead,
cpu
or
and
doesn't
take
a
lot
of
time,
especially
if
we're
talking
about
moving
state
in
scenarios
where
we're
trying
to
do
scaling
we're
ready
in
an
overloaded
situation.
B
So
we
don't
want
to
impose
a
lot
more
load
onto
what's
already
overloaded
okay.
So
what
could
we
use?
Well,
one
thing
is
we
could
say
why
not
use
virtual
machine
snapshots.
We
already
have
virtual
machines
that
our
network
functions
are
running
on.
We
know
really
well
how
to
snapshot
virtual
machines
and
clone
them
efficiently.
The
problem
is,
we
can
use
this
to
do
scale-up.
This
will
give
us
a
copy
of
the
state
we
need
for
both
of
these
red
and
blue
flows,
and
we
can
move
the
blue
flow
and
we'll
have
at
state.
B
The
problem
is
when
we
run
into
that
scale.
Down
scenario:
we
have
no
way
to
can
recombine
to
vm
images
into
one.
So
that's
not
going
to
work
out
another
solution
that
exists
out.
There
is
a
system
that
came
out
of
IBM
Research,
it's
called
split
merge.
The
basic
idea
of
split
merge
is
that
you
use
shared
library
in
order
to
access
and
update,
and
excuse
me
in
order
to
access
and
create
state
internally.
So
you
basically
replace
all
memory
allocation
calls
with
calls
to
their
library
functions.
B
The
problem
is
that
they're
targeting
a
very
specific
scenario,
which
is
elastic
scaling,
so
it's
not
clear.
Their
solution
will
work
in
other
scenarios
and
also
in
their
system.
They
don't
provide
any
of
these
safety
guarantees,
that'll
ensure
that
we
don't
lose
important
updates
and
that
packets
aren't
reordered
in
cases
where
that
can
affect
the
accuracy
of
network
function.
So
this
brings
me
to
our
solution.
Open
it
up,
open
and
apps
architecture
is
very
similar
to
what
you'll
see
in
SD
on.
B
So
we
have
this
logically
centralized
open
and
f
controller,
and
on
top
of
this
will
run
scenario,
specific
control
applications.
So
one
control
application
may
be
implementing
this
elastic
and
f
scaling
example
that
I
talked
about
and
it'll
issue
operations
to
move
copy
or
share
state
as
it
needs
to
underneath
this
controller,
then
we'll
have
the
network
functions
themselves
and
they
were
going
to
conform
to
some
sort
of
southbound
API
that
we've
developed
such
that
we
can
accurately
export
an
import
state
from
these
different
instances.
B
So
when
a
control
application
issues
in
operation,
a
module
within
the
controller
will
translate
that
into
a
series
of
the
southbound
API
calls
to
do
our
state
transfer
and
once
state
has
been
successfully
transferred.
We
can
then
communicate
with
an
existing
forwarding
module
to
tell
it
to
update
the
forwarding
state
in
our
switch
and
rewrite
our
traffic.
So
I'm
gonna
talk
a
little
bit
about
the
the
southbound
part
first
and
then
I'll
go
into
how
we
implement
these
higher
level
functions.
B
B
So,
to
give
you
an
example,
let's
take
a
look
at
the
state
for
an
intrusion
detection
system,
specifically
the
bro
intrusion
detection
system,
which
is
an
open
source.
Ids.
That's
existed
for
many
years,
so
here
we
have
for
every
single
TCP
connection,
a
couple:
different
objects,
a
connection,
object
and
protocol
specific
analyzer
objects
and
we're
going
to
organize
these
in
some
sort
of
a
hash
table.
B
Likewise,
we
have
state
that
is
maintained,
/
host,
so
for
every
host,
we're
going
to
maintain
a
count
of
how
many
different
connections
have
been
established
or
attempted
to
be
established
with
that
host.
Likewise,
we
may
have
some
state,
that's
updated
for
every
single
packet
we
process
and
something
like
statistics
applies
to
all
the
different
flows
that
thus
network
functions
responsible
for
so
we
can
use
this
taxonomy
to
develop
a
relatively
simple
API
that
allows
us
to
get
put
and
delete
state
from
these
network
functions
on
this
flow
basis.
B
So
these
functions
first
of
all,
except
what
kind
of
scope
of
state
are
we
interested
in
and
a
filter
that
defines
a
flow
space
for
what
types
of
flows
one
set
of
flows
were
interested
in
then
we
modify
the
network
functions
to
accommodate
this
operation.
It
can
take
its
internal
state,
apply
this
filter
to
it
and
any
state
that
matches
will
be
sent
to
the
controller.
Likewise,
if
the
controller
wants
to
provide
some
state
to
be
integrated
into
the
middle
box,
the
middle
box
can
take
this
state
and
integrate
it
into
its
existing
structures.
B
So
this
relatively
simple
API
means
that
we
don't
have
to
expose
or
change
how
the
network
function
is
organizing
its
state
internally,
and
it
provides
an
intuitive
way
for
us
to
reason
about
what
state
were
interested
in.
So
now
that
we
have
these
capabilities
from
network
functions
now
we
can
go
about
using
these
for
our
to
realize
the
operations
that
our
control
applications
issue.
B
B
The
first
thing
that's
going
to
do
is:
ask
the
middle
box
for
any
state
that
it
has
related
to
http
flows
and
that
states
going
to
be
provided
to
our
controller
next
we'll
go
ahead
and
flush,
this
state
from
our
first
instance,
because
we
don't
need
it
there
anymore
and
we'll
put
this
state
to
our
second
instance.
Now
that
the
state's
been
moved,
we
can
finally
go
ahead
and
update
our
forwarding
such
that
we
can
resume
analyzing
our
HTTP
traffic.
At
the
second
instance,
we
have
similar
capabilities
to
be
able
to
copy
and
share
state.
B
I
won't
go
into
the
details
of
that
here,
but
I'm
happy
to
answer
questions
about
that.
If
you
have
them
later
on.
Ok,
so
we've
addressed
this
first
challenge
now.
How
do
we
deal
with
all
these
race
conditions
and
providing
important
safety
guarantees?
So
one
problem
that
can
occur
in
the
move
operation
I
just
showed
is
that
we
can
lose
packets
or
lose
updates
to
state
as
a
result
of
packets
arriving.
B
While
we're
trying
to
do
this
state
transfer
so
I'm
going
to
assume
here
that
we're
running
the
bro
intrusion
detection
system
and
it's
running
a
script
that
computes
a
hash
of
the
payloads
of
all
the
packets
for
a
given
connection
and
compares
that
hash
against
a
database
of
known
malware.
This
is
a
standard
script
that
comes
with
this
IDs,
so
we'll
go
again,
have
two
different
flows:
red
flow
in
a
blue
flow.
So
when
packets
come
in,
the
IDS
is
going
to
say:
ok
what
half?
B
What's
the
hash
of
this
packet
matted
to
a
rolling
hash
that
it's
computing
so
now
at
some
point,
I
say
well:
I
want
to
move
the
red
flow,
so
I'm
going
to
go
ahead
and
do
my
state
transfer
like
I
did
before,
but
before
I
had
a
chance
to
update
my
forwarding
state.
Another
packet
comes
in
for
this
red
flow,
so
now
this
packet
comes
in
and
the
intrusion
detection
system
says
why
don't
have
any
state
for
the
red
flow.
B
This
must
be
a
new
flow,
so
it's
going
to
go
ahead
and
establish
some
new
state
now
at
some
point,
our
boarding
updates
going
to
take
effect
and
now
our
third
packets
going
to
come
in
and
now
when
we
try
to
compute
a
hash
over
this
third
packet.
Now
we
only
have
the
first
two
packets
and
so
the
hash
that
will
excuse
me
the
first
and
the
third
packet.
So
now
this
hash
that
we
compute
is
going
to
be
incorrect
and
we're
not
actually
going
to
detect
that
there's
some
malware
in
this
flow.
B
So
what
we
want,
as
we
want
a
guarantee
that
these
state
operations
are
lost
free,
we
want
to
make
sure
that
we're
not
losing
any
packets
and
that
all
packets
are
being
processed
that
that
have
passed
through
our
network
at
this
point
in
time.
So
split
merge
also
provides
a
limited
form
of
this
loss
freeness,
but
it
turns
out
a
key
thing
they
don't
deal
with.
B
Is
the
fact
that
packets
may
already
be
in
transit
to
a
network
function
at
the
time
we
start
the
state
transfer,
so
while
they
can
buffer
packets
at
the
switch
they're,
ignoring
the
fact
that
packets
may
have
already
passed
through
this
switch,
so
this
doesn't
quite
give
us
the
lost
freeness
that
we
want.
So
how
do
we
go
about
doing
this?
B
Well,
we're
going
to
enhance
the
capabilities
that
the
network
functions
provide
for
us
just
a
little
bit
we're
going
to
add
an
event
mechanism
such
that
when
some
set
of
packets
come
into
this
network
function,
we
can
say:
do
any
of
these
packets
match
a
filter,
and
if
they
do,
we
can
send
an
event
to
the
controller
that
says:
hey
I
was
about
to
process
this
packet.
It
was
going
to
update,
or
it
may
have
been
about
to
update
some
state
that
you're
trying
to
move.
B
We
can
then
tell
the
network
function
to
either
go
ahead
and
process
packet
buffer
for
processing
later
on,
or
simply
throw
it
away
and
not
process
it
any
further
and
to
add
this
capability,
we
just
need
to
modify
the
main
packet
receive
function
within
the
middle
box
out
of
code
that
add
a
little
bit
of
code.
That
checks
should
I
be
raising
an
event
or
not
fairly
simple
change.
Okay.
So
how
do
we
use
this
now?
To
get
this
loss?
Free
property?
Well?
Well,
first
thing
we'll
do
before
we
start
transferring
any
state.
B
So
now
we
make
our
forwarding
update
and
when
the
third
packet
comes
in,
it
turns
out
that
we've
seen
all
packets
for
the
flow
they've
all
reflected
in
the
state.
We
can
compute
our
correct
hash,
and
now
we
can
detect
that.
There's
now
we're
here
now
there's
another
potential
problem
we
run
into,
which
is
reordering
and
in
fact
adding
this
loss.
Free
mechanism
can
actually
introduce
reordering.
That
may
not
be
possible
otherwise,
and
this
could
be
problematic
in
the
case
of
a
script
that
comes
with
bro.
B
That
looks
for
weird
activity
looks
for
things
like
did
you
get
a
syn
packet
after
you've
already
got
in
a
data
packet.
So
let's
go
back
to
this
fifth
step
from
the
last
slide,
where
we
were
flushing
the
packets
that
were
buffer
to
the
controller,
so
we'll
flush
these
and
then
we'll
go
ahead
and
make
our
forwarding
update.
So
now
we
make
our
forwarding
update,
but
before
that
update
takes
effect,
another
packet
comes
in
so
that
packet
comes
in.
It
goes
to
our
third
instance,
our
third
instance
as
well.
I've.
B
Excuse
me
our
first
instance
says:
I've
events
enabled
so
I'm
going
to
send
this.
Third
back
to
the
controller,
the
controller
will
say:
I've
already
flushed
the
buffer
offense,
so
I'll
just
go
ahead
and
pass
this
directly
to
the
switch
and
directly
to
my
second
instance,
but
before
this
packet
reaches.
That
second
instance
are
forwarding
updates
already
taken
effect,
so
it's
possible.
Another
packet
comes
the
switch
that
packet
gets
forwarded
to
the
second
instance
and
now
arrives
before
we've
gone
through
this
whole
sequence
of
forwarding
along
this
third
packet.
B
B
So
how
do
we
go
about
realizing
this?
How
am
I
doing
on
time?
Okay,
let
me
actually
I'm
going
to
skip
through
this,
because
it's
kind
of
complex
and
we
can
come
back
to
it
later
if
people
have
questions
okay,
so
third
challenge
issue
of
overhead.
How
do
we
make
sure
that
we're
not
introducing
a
lot
of
memory,
CPU
and
other
overhead
in
actually
providing
these
operations?
B
Well,
so
the
thing
is
that
we're
given
applications,
some
choices,
the
first
choice
that
we're
giving
them
is:
what
sort
of
state
do
you
want
to
move
if
you're
only
moving,
HTTP
flows,
you
only
need
to
move
state
relating
to
those
HTTP
flows.
If
you're
trying
to
create
a
middle
box,
that's
highly
available,
so
you're
snapshotting
state,
you
may
say
I
only
care
that,
if
something
fails
that
a
certain
set
of
flows
continue
to
be
processed
correctly.
So
now
you
only
need
to
grab
that
state.
B
The
other
option
is
that
you
can
decide
whether
or
not
you
need
these
guarantees.
So
the
example
that
I
was
going
through
this
intrusion
detection
system
was
off
path.
That's
what
makes
it
an
IDs
versus
an
intrusion
prevention
system
so
because
this
IDs
is
off
path.
If
packets
get
if
packets
get
dropped
on
their
way
to
the
IDs,
there's
no
way
to
get
those
retransmitted,
the
idea
is
getting
a
copy
of
traffic.
B
However,
in
the
case
of
an
IPS,
if
a
packet
gets
dropped
on
its
way
to
the
IPS,
that
IPS
is
in
the
middle
of
a
connection,
which
means
normal
TCP
mechanisms
will
recover
from
that
loss
and
that
IPS
I'll
have
another
opportunity
to
see
that
packet.
So
in
that
case
we
don't
need
this
loss,
free
property
and
so
by
giving
control
applications
the
flexibility
to
choose
what
they
want.
They
have
some
control
over
how
much
overhead
they
experience.
B
Okay,
so
going
back
to
our
three
goals,
we
wanted
to
get
SLA
s.
We
wanted
to
make
sure
that
we
could
do
it
at
low
cost.
We
want
to
make
sure
that
our
network
functions
are
happening,
we're
operating
accurately
and
analyzing
traffic.
So
we've
addressed
this
issue
of
diversity
by
making
sure
that
our
changes
that
we
make
to
import
and
export
state
are
simple
and
we
have
a
simple
events
mechanism.
We
deal
with
race
conditions
by
adding
this
events
mechanism
and
by
having
lockstep
forwarding
updates.
B
B
The
controller
itself
is
implemented
as
a
module
running
atop
of
the
floodlight
SDN
controller
and
we've
also
implemented
a
communication
library
that
can
be
linked
into
network
functions
in
order
to
communicate
between
the
controller
and
the
network
functions
themselves.
We've
modified
four
different
network
functions
so
far
to
conform
to
our
southbound
API
and
provide
events
and
export
State.
So
this
is
the
bro
intrusion,
detection
system,
we've
modified
iptables,
squid,
caching,
proxy
and
also
pratts,
which
is
a
asset
detection
and
monitoring
system.
That's
used
in
our
University
Network.
B
So
how
well
does
open
an
F
perform
and
doesn't
actually
give
us
the
benefits
we
wanted,
so
we're
going
to
take
a
situation
here
where
we
have
a
trace
of
traffic
from
our
campus
network
that
we're
replaying
at
a
rate
of
10,000
packets
per
second
and
we're
going
to
start
with
one
instance
of
the
boro
intrusion
detection
system
180
seconds
into
the
experiment.
We
say:
move
all
HTTP
flows
to
be
processed
by
a
new
instance
180
seconds
later
we're
going
to
move
any
active,
HTTP
foes
at
that
time,
back
to
the
original
instance.
B
So
in
order
to
actually
do
the
transfer
of
state
that
we
need
takes
260
milliseconds,
and
so
that's
quick
doesn't
take
very
long.
We
also
looked
at.
Is
this
accurate?
Have
we
maintain
the
accuracy
of
the
network
function,
so
we
compared
what
happened
if
we
let
all
of
the
traffic
be
analyzed
by
one
IDs
and
didn't
do
these
moving
back
and
forth
operations
versus
what
happens?
What
is
the
output
of
the
IDS?
If
we
do
this
scale
out
and
scale
back
in
turns
out,
the
log
entries
are
equivalent.
B
If
we
had
used
this
VM
replication
that
I
talked
about
earlier,
there
would
be
entries
missing
from
our
because
when
we
do
this
scale
in
operation,
we
have
no
way
to
combine
to
vm
snapshots
together.
Lastly,
there's
this
issue
of
cost,
so
how
quickly
were
we
able
to
scale
in?
We
were
able
to
scale
in
as
long
as
it
took
us
to
move
the
state
back,
which
again
was
about
260
milliseconds?
B
If
we
had
used
waited
for
flows
to
diox--,
the
flows
in
this
particular
trace
lasted
more
than
25
minutes,
and
so
we
would
have
needed
to
unnecessarily
continue
to
run
the
second
instance
of
the
ideas
until
those
clothes
had
finished,
so
that
would
have
been
a
lot
of
extra
cost
that
we
would
have
been
paying.
So
I
said
this
move
takes
260
milliseconds.
How
does
what
we're
doing
it?
The
network
functions,
contribute
to
that.
B
So
we
can
look
at
how
long
these
get
and
port
operations
take
on
our
network
functions,
and
we
did
this
for
three
of
the
thing
network
functions
that
we
modify
and
it
turns
out
that
the
cost
to
serialize
in
deserialize
state
is
most
of
the
time
that
we
spent
in
these
network
functions.
So
potential
Urso,
definite
improvement
opportunities
there.
If
we
can
do
a
better
job
with
how
we
go
about
serializing
and
deserializing,
we
may
be
able
to
improve
the
efficiency
there.
B
So
we
have
these
low-level
operations,
but
how
about
the
high
level
operations
and
out
of
the
guarantees
impact
the
time
that
it
takes
us
to
do
these
move
operations?
So
here
we're
going
to
assume
that
we're
running
the
asset,
the
preds
asset
detection
system,
we're
again
using
the
same
trace
of
traffic
at
a
slightly
lower
rate,
5000
packets
per
second
and
we're
going
to
move
the
state
for
500
flows
that
are
active
at
a
given
point
in
time.
B
So,
if
we
look
at
how
long
it
takes
for
this
move
operation
to
complete,
we
can
look
at
first
of
all
what
happens
if
we
don't
provide
any
guarantees?
If
we
don't
provide
any
guarantees,
then
we're
talking
about
190
milliseconds
to
do
this
operation.
We
can
do
some
parallelization
of
our
guests
and
our.
What's
that
we're
issuing
in
order
to
speed
things
up
a
little
bit
so
now
we
can
cut
that
down
almost
in
half
not
quite
half
to
about
130
milliseconds
great
now.
B
The
problem
here
is
that
we're
losing
packets
as
a
result
of
this,
so
without
any
guarantees
on
Las
freeness
or
order
preservation,
even
in
the
best
case,
we're
losing
462
packets.
So
we
add
in
our
loss,
freeness
guarantee.
Now
our
move
operation
takes
longer
takes
about
twice
as
long,
but
we're
not
going
to
lose
any
packets.
B
If
we
add
in
this
order
preserving
requirement,
we
again
see
another
increase
in
the
amount
of
time
it
takes,
but
we
don't
see
a
significant
increase
in
the
amount
of
overhead
that
we're
imposing
on
packets,
although
there
are
more
packets
that
we're
imposing
this
overhead
on.
So
here,
with
this
board
of
preserving
operation,
we
end
up
buffering
883
83
packets
at
the
controller
and
also
I-
didn't
talk
about
this,
but
there's
another
approximately
a
thousand
packets
that
we
buffer
at
our
Center
second
network
function
before
they're
processed.
B
So
the
overall
takeaway
here
is
that
these
operations
are
reasonably
efficient,
but
the
guarantees
that
we
want
to
offer
in
some
cases
do
come
at
a
cost,
and
so
it's
important
for
control
applications
to
have
that
flexibility
to
decide
whether
or
not
they
need
these
guarantees.
So
where
are
we
going
from
here?
What's
the
next
steps
for
open
enough?
Well,
the
first
thing
is
that
there's
a
lot
there's,
there's
buffering
that
was
happening
in
the
Los
trina's
case.
B
There's
even
more
buffering,
that's
happening
in
the
order
preserving
case,
and
so
the
question
is
how
can
reduce
this
amount
of
buffering
that's
happening
in
an
effort
to
reduce
the
number
of
packets
that
receive
extra
overhead
and
in
order
to
reduce
the
memory
usage
of
our
system?
So
one
thing
that
we
can
do
is
rather
than
pausing
traffic
and
immediately
saying
before
this
state
transfer
starts.
I
want
you
to
start
raising
events.
We
can
allow
the
network
function
to
continue
to
process
packets
and
then
any
packets
that
are
processed.
B
The
second
thing
that
we
can
do
is
either
improve
the
scalability
of
this
system.
So
right
now,
all
these
packets
and
oh,
the
state,
is
going
through
the
controller,
which
means
there's
a
limit
to
how
many
operations
we
can
handle
simultaneous
of
the
controller,
but
it
turns
out
the
controller
doesn't
have
to
be
involved.
We
can
actually
use
a
peer-to-peer
mechanism
to
transfer
state
directly
between
instances
of
a
network
function
and
still
get
all
the
same
safety
guarantees
that
we
want.
B
Lastly,
I
said
we
need
to
modify
the
network
functions
and
obviously
there's
a
lot
of
network
functions
out
there.
So
how
do
we
make
this
task
easier
to
do?
Well,
where
we
can
use
some
techniques
from
Berman
alisis
in
order
to
analyze
the
network
function
code
and
automatically
figure
out
what
state
is
this
maintaining
and
what
state
do
we
need
to
actually
export
from
this
network
functions
as
we
have
some
ongoing
work
in
that
area.
B
So,
in
conclusion,
I
hope,
I've
convinced
you
that
we
need
something
more
than
just
an
FB
and
sdn.
In
order
to
be
able
to
realize
rich
scenarios,
we
want
to
dynamically
reallocate
packet
processing.
Particular
we
need
the
ability
to
quickly
move
copy
or
share
network
function
state
and
do
it
in
a
way.
That's
also
safe
and
we've
achieved
this
with
open
enough.
If
you
want
to
learn
more
or
if
you
want
to
try
out
the
code
for
open
an
app,
I
encourage
you
to
visit
our
website
open
an
FCS
wisc.edu
with
that
I.
C
C
So,
for
example,
is
this
a
chunk
of
malware
that
I've
seen
before
as
opposed
to
does
this
packet
have
these
bit
set,
so
you
didn't
I,
don't
think
you
had
the
graph
that
actually
shows
kind
of
the
size
on
the
x-axis
and
the
impact
on
the
y-axis
other.
You
had
examples.
So
what
does
that
look
like?
Do
you
have
that
or.
B
So
I
guess
I
don't
have
an
exact
graph.
The
best
I
can
show
you.
The
best
I
can
put
up
here
is
sort
of
this
which
says:
here's
how
how
much
done
that
so
in
the
case
of
iptables
I,
can
give
you
an
idea
that
the
state
for
a
single
flow
is
less
than
a
kilobyte
in
the
case
of
bro,
we're
talking
about
a
hundred
or
two
hundred
kilobytes
of
state
for
flow.
B
So
it
is,
it
is
reasonably
small,
that's
true,
and
so
one
thing
you
can
do
is
is
is
to
be
able
to
start
to
proactively
copy
some
of
the
state
and
our
replay
events.
That's
future
work
would
enable
that.
The
other
thing
I
want
to
touch
on
that
you
mentioned
is
this
idea
that
everything
I
was
assuming
here
was
/
flow.
B
C
B
C
C
I
guess
there's
another
kind
of
question
related
to
that
which
is
probably
bigger
than
just
your,
but
so
find
a
cascade
of
three
or
four
of
these
functions
and
one
of
them
Rob's
the
packets
in
some
ways
such
that
reclassification
of
the
prior
uplink
thing
needs
to
be
done,
but
now
you've
migrated
one
to
some
other
place.
What
kind
of
situations
could
I
get
myself
in
with
respect
to
do
that?
In
this
era?
Scheduling
something
or
other
you
do
to
deal
with
that
yeah.
B
So
we've
so
we've
thought
a
little
bit
about
sort
of
the
chaining
scenario
where
you
have
many
of
these
network
functions
that
you're
passing
through
so
I
think.
We
think
that
in
many
cases
you
can
sort
of,
if
you
have
a
chain,
you
can
sort
of
my
great
for
one
middle
box
in
the
chain
at
a
time,
and
you
are
doing
some
temporary
redirection.
In
that
case,
you
can
certainly
do
better
scheduling
if
you
look
at
the
entire
chain
at
a
time.
B
C
A
A
A
If
you
have
a
problem,
if
you
have
the
subscribers
which
was
registered
in
the
Sun
network
element,
you
can
just
move
with
him
because
he
must
be
aware,
as
you
know,
made
registration
two
different
that
requirement.
So
in
some
cases
only
application
itself
can
move
the
state
and
coming
from
other
elements,
that's
now
the
SUBSCRIBE
all
I
move
the
state
from
this
unit
from
da
sein.
So
I
mean
it's
in
some
up
for
some
application.
You
can
change
the
state
with
the
control,
but
for
some
cases
you
can
do
it
only
with
the
application
level.
B
True,
sir,
so
I
agree
that
there's
certainly
there's
some
information
you
need
to
know
about.
The
network
functions
go
to
know
how
you're
going
to
go
about
writing
these
applications
and
that's
something
that
we
haven't
yet
done.
A
good
job
of
capturing
we're
hoping
that,
ideally,
some
of
our
program
analysis
could
give
you
a
simplified
model
of
how
this
network
function
works
or
potentially,
give
you
recommendations
on
ears
which
are
huge,
which
you
should
have
your
control
application
do,
and
if
you
have
it,
do
this
you'll
get
this
equivalency
level
of
output.
B
A
B
Really
up
to
the
control
applications
how
they
want
to
do
it.
So
your
control
application
and
the
scaling
scenario
could
be.
Maybe
it's
monitoring,
CPU
and
it
says
I'm
going
to
monitor
CPU
and
then
I'm
going
to
do
some
sort
of
measurements
of
what
are
my
elephant
flows
to
figure
out
exactly
what
set
of
flows
I
want
to
move
from
one
box
to
another,
so
that's
completely
flexible
and
you
could
implement
whatever
you
want
it
there
right.
C
D
B
There's
there's
some
interesting
questions
there
and
that's
one
of
the
reasons
we
also
want
to
look
at.
How
can
we
reduce
the
amount
of
state
that
we're
transferring,
and
so
some
of
our
program
analysis
is
trying
to
understand,
rather
than
exporting
all
of
the
state
that
the
network
function
is
maintaining?
Can
we
figure
out
what
state
was
updated
since
the
last
time?
Maybe
we
create
a
snapshot
in
a
failover
situation,
or
can
we
figure
out?
B
Maybe
some
state
affects
the
packets
that
are
output,
but
are
not
by
our
network
function
and
other
state
affects
the
log.
And
maybe
we
say
you
know
in
something
like
a
caching
proxy,
we're
not
really
concerned
about
the
accuracy
of
the
log.
So
we're
not
remove
that
state,
and
so
you
may
be
able
to
limit
what
state
you
move
in
exchange
for
a
relaxed
notion
of
the
behavior
of
your
network
function
and
how
much
it
compares
to
what
would
you
would
have
gotten
if
you
didn't
move
at
all.
D
B
So
it's
an
excellent
question:
I
haven't
really
thought
about
it
in
terms
of
control,
plane
devices
I've
only
really
thought
about
in
terms
of
data
playing
devices.
I
think
I
think
there's
probably
a
different
problem
there
and
potentially
a
simpler
solution.
When
you
start
to
talk
about
things
that
the
control
plane,
sort
of
the
thing
that
comes
most
mind
is
work,
that's
being
done
in
the
distributed
SDN
controller
case,
where
there
you're
SDN
controller,
is
your
control
plane,
and
so
there
you're
concerned
about
moving
state.
B
D
A
D
D
B
D
B
I
think
you
could
I
think
one
challenge
that
you
certainly
face
is
sort
of.
Where
is
this
going
to
which
is
sort
of
standard
NFP
challenge
you
know
to
migrate
in
an
across
the
entire
continental
United
States
versus
to
migrate
it
between
something
in
a
metro
area
is
going
to
be
a
really
different
situation
and
one
is
probably
feasible.
The
other
is.
D
No,
this
is
some
kind
of
state
you
have
to
preserve
as
one
other
right.
Second,
if
I've
understood
well,
because
they
see
a
BCS
to
share
my
my
view
and
to
see
whether
you
share
it
as
well
as
I
see
this
as
a
similarity
between
these-
and
these
are
the
some
time
ago.
I
remember
when
in
object-oriented
programming
is
a
object,
persistence
framework
I
think
this.
This
is
a
clear
connection
right,
so
this
is
very
much
connected
with
this.
Yes,.
B
We
haven't,
we
haven't
necessarily
looked
specifically
at
that
body
of
research,
although
we
have
started
to
look
at
it
actually
as
we're
doing
some.
This
program
analysis,
because
there's
all
sorts
of
things
to
figure
out
what
objects
exist
beyond
the
processing
of
a
single
packet
and
what
objects
are
only
used
during
the
processing
of
that
one
packet
at
this
middle
box
and
so
I
think
there
is
definitely
a
broader
body
of
work
there
that's
worth
considering,
because.
D
There
are
some
researchers
have
been
checking
with
that.
They
are
starting
to
think
precisely
on
a
network
programming
paradigm,
that
is
object
oriented,
and
they
are
precisely
one
of
the
properties
that
they
were
thinking
about,
what
about
persistence
and
this
kind
of
rubric
ability?
This
is
something
that
it
was
taking
note
because
probably
will-
and
finally
it's
about
that-
you
were
mentioning
here-
a
control
application,
the
culture
playing.
This
is
something
that
your
start
thinking
about.
Well,
did
you
take
care
of
the
dsdna
architecture?
You
have
the
end
of
your
collection
and
well.
D
B
B
But
it's
not
it's
unclear
how
tightly
you
can
integrate
those
because
they're
each
solving
us
that
there
is
solving
a
slightly
different
problem
and
so
I
think
there's
just
going
to
be
me
need
to
be
some
interfaces
there
for
the
same
reason
that
when
you're
talking
about
NF
the
orchestration,
you
may
have
an
interface
into
your
system.
That's
going
to
worry
about
launching
the
VMS
themselves
and
figuring
out
where
they're
going
to
go,
and
then
a
system
that's
going
to
worry
about.
Okay.
B
Now
what
nfm
is
I'm
actually
putting
on
this,
so
even
there
that
could
be
split
into
multiple
controllers,
potentially
so
it's
sort
of
at
one
point:
do
we
end
up
with
too
many
controllers
running
around
the
network
and
I
expectable?
We
are
rapidly
approaching
that
and
it's
a
break,
a
run
problem,
but.
A
C
There
any
other
questions
for
Aaron
I
did
have
one
I'm
wondering
it's
it
trivial
to
sort
of
bound
the
amount
of
buffer
space
you
need
in
the
controller,
or
is
that
so
we
do
kind
of
bound
the
number
of
flows
you
can
migrate
to
stop
that?
Yes,.
B
There's
a
couple
different
things
you
can
do
so
in
theory,
it's
reasonably
predictable.
You
know
how
long
you
know
how
much
on
average,
how
big
state
is,
and
we
can
predict
how
long
it's
going
to
take
to
transfer
that
but
you're
right
they
there's
this
trade-off
of
the
more
state
you're
transferring
the
longer
it
takes
the
more
buffering
you
need
to
do
so.