►
Description
Held on 1230-1345 UTC on 23 July 2019, this Technology Deep Dive session at IETF 105 started with a description of how a basic network interface card (NIC) operates and led into NIC feature evolution.
A
Sorry
for
the
slight
delay,
but
now
we
are
also
online
that's
great
good
morning.
This
is
the
deep
dive
session,
that's
kind
of
the
second
tip
type
session.
We
are
having
at
IGF
class
meeting.
We
had
a
deep
type
session
on
router
architectures,
which
people
gave
us
a
lot
of
positive
feedback
about.
So
we
thought
we'd.
Do
it
again,
this
time
we
will
look
into
Nicks
and
we
have
some
people
here
who
usually,
or
some
of
them
usually
come
to
the
ITF
they're
from
the
linux
native
community.
A
A
B
B
Okay,
so
when
we
was
copying
this
talk
an
hour
and
a
half
didn't
seem
to
do
justice
to
the
content,
so
we
had
to
limit
the
scope.
We
could
have
a
half-day
discussion
on
not
a
tutorial,
but
just
high
level
topics.
So
the
what
is
in
scope
is
we
will
talk
about
basic
Nick
support.
How
a
basic
Nick
works
will
proceed
to
medium
range
offload
from
the
host
stack
to
the
cut
to
the
hardware
and
slightly
more
advanced
features.
We're
gonna
use,
Linux
kernel
as
a
reference
point,
not
necessarily
the
only
way.
B
The
only
operating
system
that
does
this
what's
out
of
scope.
Is
we're
not
going
to
talk
about
kernel
bypass,
so
not
the
PDK
discussions.
We're
not
going
to
talk
about
small
CPE
devices
that
use
the
same
API
on
Linux,
at
least
or
very
large
Essex
multi,
terrible
Essex,
which
may
use
the
same
API
is
in
some
vendor.
Essex
we're
not
going
to
talk
about
virtualization
offload
technologies,
s,
RI,
o
vb,
m
DQ
and
any
newest
schemes
are
out
of
topic,
and
storage
is
also
out
of
topic,
so
this
could
be
another
session.
B
If
should
this
session
become
exciting
to
the
attendees,
we
could
have
another
session
in
the
future,
the
relationship
to
the
ITF,
if
you're
implementing
protocols.
This
is
very
relevant
to
you.
We
you're
running
on
the
host
or
some
middleboxes,
which
end
up
using
NYX
for
nodes
that
perform
both
host
level.
B
Features
or
performing
functions
so
NYX
can't
process
a
lot
accelerate
as
well
a
lot
of
and
have
a
lot
of
helpers
in
the
hardware
for
TCP
UDP
quake
TLS
IPSec.
A
lot
of
the
nvo
3
is
mostly
commodity
offloading.
At
this
point
in
time,
you
can
accelerate
any
of
the
layer
2
to
layer,
n,
forwarding
and
filtering
there's
a
lot
of
QoS
offloading,
it's
a
very
condensed
session.
So
what
we'll
ask
is
you
can
will
only
allow
for
clarification,
questions
and
any
other
questions
that
you
may
have
will
come
at
the
end.
B
I'm
gonna
introduce
the
presenters.
You
have
a
very
competent
set
of
forks
here
on.
The
left
is
Tom
Hubbard
from
Intel
Andy
goes
for
direct
from
Broadcom
and
Simon
hormone
from
metronome
and
I'd
like
to
acknowledge
Boris
where's
Boris
from
Mellanox.
These
are
very
competent
Fox
they
have
in.
They
know
how
the
implementations
work
they
understand.
They
had
were
very
well
so
you're
in
good
hands.
C
C
How's
that
yeah
yeah
much
better
okay,
so
I'm
going
to
present
the
fundamentals
and
basic
offloads
Nicks.
So
a
few
definitions
might
be
useful.
Nick
is
a
network
interface
card,
sometimes
network
interface
controller.
This
is
the
host
interface
physical
interface
to
a
physical
Network.
Host
stack
is
the
software
that
processes,
packets
and
as
protocol
processing
in
the
host.
C
Typically,
this
is
layer,
2,
layer,
3,
layer,
4
processing,
a
kernel
stack
is
simply
a
host
stack
that
runs
inside
a
colonel
and
sjl
mentioned,
for
the
most
part
we'll
be
referencing
Linux,
for
that
offload
is
when
we
do
something
inside
the
NIC
on
behalf
of
the
host.
So
this
is
work
that
we
move
essentially
from
the
host
to
the
NIC
for
some
purpose,
work
that
involves
a
networking
and
acceleration
is
offload
that
is
done
mostly
for
performance
gains.
So
what
is
a
network
interface
card?
C
This
shows
the
picture
on
the
left
of
a
card,
and
most
of
you
should
be
familiar
with.
These
whoever's
had
a
PC,
for
instance,
knows
how
to
plug
these
in
so
they
go
into
the
system.
Bus
I
would
point
out
this
particular
card
very
ancient.
Actually,
it
has
a
BNC
connector,
so
this
is
true
Ethernet
and
eisah
connectivity,
but
nevertheless
it's
a
NIC
and
modern-day
NICs
obviously
look
a
little
bit
different,
but
basically
perform
the
same
function.
So
a
NIC
is
the
receiver
and
transmitter
packets
to
the
network
to
the
physical
network.
C
It's
the
device
that
does
that
and
on
the
right
we
have
a
stack
and
you
can
see
that
in
the
protocol
stack,
the
NIC
is
kind
of
at
the
bottom
and
on
one
side
to
the
outside
world.
It
connects
to
the
physical
media
that
could
be
fiber,
cat5
radio
and
we
use
some
sort
of
encoding
or
framing
over
that
media
Ethernet
Wi-Fi
fibre
channel
on
the
other
side
of
the
NIC.
It
connects
into
the
system
via
the
system
bus.
So
typically,
today,
this
is
PCIe
or
USB.
C
In
the
olden
days
like
this
card,
it
was
eisah.
So
the
way
this
works
is
that
Knicks
have
queues.
Typically,
they
have
a
transmit
Q
and
a
receive
Q,
and
these
queues
store
the
packets
or
indicate
the
package
for
transmit
and
receive
the
queues
are
composed
of
a
set
of
descriptors
and
the
descriptors
describe
the
packet
for
the
net.
Some
of
the
important
things
on
these
descriptors
are
where
the
packet
is
located
in
hosts
memory.
C
What
the
length
of
the
packet
is
and
then
some
ancillary
information
that
may
have
involved,
for
instance,
if
this
was
received
as
a
broadcast
Ethernet
and
other
information
like
that.
So
in
order
to
transmit
the
host
stack,
fills
out
a
transmit
descriptor
and,
most
importantly,
it
writes
the
information
in
that
for
the
packet
where
the
packet
is
located
in
its
memory
and
what
the
length
of
the
packet
is.
C
It
processes
the
transmit
cue,
and
it
looks
at
each
of
the
transmit
descriptors
figures
out
where
the
packet
is
in
hosts
memory,
performs
a
DMA
operation,
direct
memory,
access
to
pull
the
packet
into
its
local
memory,
and
then
the
Nick
may
perform
some
offload
processing
which
we'll
talk
about
in
a
bit,
but
eventually
the
packet
has
to
be
sent
on
the
network.
So
there
is
a
thigh
and
an
assertive
serializer
inside
the
device
that
takes
the
packet
in
its
memory.
C
Serializes
the
data
and
sends
it
out
to
the
actual
network
receive
is
somewhat
similar
in
the
received
path.
The
host
sets
up
a
number
of
packet
buffers
where
packets
will
be
stored
in
its
memory,
and
it
puts
these
into
the
receive
queue
in
the
receive
descriptors.
So
again,
in
each
descriptor
there
is
a
memory
location
and,
in
this
case,
maximum
length
of
the
packet.
C
C
The
host
memory,
location
DMA,
is
the
packet
into
that
host
memory
sets
the
length
in
the
received
descriptor
increments,
the
producer,
pointer
or
consumer,
it's
consumer
pointer
in
the
receive
queue,
and
then
it
interrupts
the
host,
which
is
typically
an
actual
system,
interrupt
and
the
host
wakes
up
and
knows,
there's
packets
to
process
in
the
receive
queue.
So
it
actually
reads
the
queue
and
then
can
get
the
packets
that
have
been
received,
new
processes
of
them
in
the
stack.
C
So
what
I
just
described
is
kind
of
fundamental
and
that's
the
basics
of
the
neck
and
be
started
in
approximately
the
early
90s,
not
soon
after
some
of
the
basic
off
loads
that
I'll
talk
about
in
a
minute
came
into
being
and
we're
developed,
and
we
can
track
the
evolution
of
NICs
since
then.
So
in
the
mid
2000s,
we
have
data
plane
accelerations.
So
these
are
more
advanced
features
inside
the
Nix
IPSec
offload,
for
instance,
QoS
offloads
and
more
recently,
there's
a
general
movement
to
make
these
devices
programmable.
C
C
So
we'll
talk
a
lot
about
offloads
today,
I
want
to
give
a
little
bit
of
motivation.
One
way
you
can
think
of
offloads
is
these
are
just
advanced
features
having
to
do
with
the
packet
processing
or
protocol
processing.
That
happens
to
be
done
in
the
net.
So
there's
a
few
rationales
for
this
one
is
we
want
to
free
up
the
host
CPU
cycles
for
application
work?
This
makes
sense
if
the
NIC
can
do
the
functions
of
networking
in
a
more
efficient
way.
C
So
since
its
specialized
hardware,
that
is
often
the
case,
for
instance,
we
can
compute
a
checksum
more
efficiently
than
doing
in
the
host
CPU.
More
generally,
one
of
the
motivations
is
to
save
host
resources,
so
offloads
may
save
not
to
CPU,
but
memory
DMA
operations,
memory,
movement,
number
of
interrupts
scaling
performance
is
very
important
and
all
floats
help
a
lot
there,
particularly
in
low
latency
and
high
throughput.
C
There's
also
some
interesting
use
cases,
particularly
in
mobile,
where
we
might
offload
certain
operations
having
to
do
with
protocol
processing
to
a
device
for
the
purposes
of
saving
CPU
cycles
and
saving
power,
in
particular
on
the
core
CPU.
So
in
short,
offloads
makes
sense
as
a
cost-benefit
trade
off.
If
the
benefits
of
moving
work
into
the
neck,
you
can
think
of
it
as
a
KO
process
or
exceed
the
cost,
then
it
makes
sense
in
practice.
This
can
be
interesting
analysis
and
we
know
that
CPUs,
for
instance,
are
always
increasing
their
capabilities.
C
On
the
other
hand,
the
network
and
things
you
want
to
do
are
always
getting
more
complex,
so
there's
always
a
bit
of
a
trade-off
between
whether
to
off
to
run
on
the
host
CPU,
but
in
general
we
found
offloads
to
be
pretty
useful
and
probably
will
continue
that
trend
in
terms
of
developing
offloads
and
nic
development
in
general.
In
the
Linux
community
at
least,
we
kind
of
enshrined
some
of
the
principles
and
something
called
less
is
more
and
I
want
to
give
three
components
of
this.
C
So
first
of
all,
protocol
agnostic
mechanisms
are
better
than
protocol
specific,
and
this
is
somewhat
of
a
formulism
of
trying
to
prevent
proto
classification,
but
the
idea
is
if
we
can
develop
an
offload
that
supports,
say
all
transport
protocols
equally
versus
one
that
is
only
only
works
with
TCP
or
plain
TCP
IP
packets
generally,
the
offload
that
is
more
general.
It's
going
to
be
more
applicable
and
better
for
the
user.
In
a
similar
vein,
common
api's
are
better
than
proprietary
api's.
C
We
have
a
lot
of
os's
a
lot
of
NICs,
the
more
common
the
API
is
across
those
the
easier
it
is
for
users
to
choose
different
pieces
of
hardware.
This
is
particularly
important
in
that
we
want
to
avoid
the
concept
of
vendor
lock-in,
which
is
where
a
vendor,
whether
purposely
or
inadvertently,
kind
of
controls,
the
API
such
that
it's
really
difficult
for
the
user
to
change
vendor
the
vendors
that
they're
using.
C
The
third
point
is
the
program
program.
Ability
is
good,
so
I
put
this
in
generally.
In
parentheses,
one
of
the
aspects
of
program
ability
is,
if
we
make
it
completely
openly
programmable,
especially
user
programmable,
and
allow
users
to
do
whatever
they
want.
Users
will
do
whatever
they
want
that,
as
we
know,
leads
to
some
interesting
fracturing
of
the
market
and
can
be
precarious.
So
we
always
want
to
make
sure
that
if
we're
gonna
create
a
open
program
environment,
how
do
we
develop
the
ecosystem
properly
and
maintain
some
semblance
of
of
sanity
across
these
and
portability?
C
So
we
can
turn
and
look
at
some
of
the
basic
offloads
I'm
gonna
skip
that
slide.
So
we'll
talk
about
three
basic
offloads
and
these
are
kind
of
the
oldest
ones,
they're
very
common
amongst
nicks.
Most
of
these
have
been
around
since
the
90s
at
least
checksum
offload,
segmentation
offload
and
multi.
Cue
checksum
offload
is
the
offload
of
the
venerable
TCP
UDP
transport
checksum.
So
the
idea
is
that
we
want
to
offload
the
computation
of
the
checksum,
so
the
ones
complement
summation
in
particular
is
CPU
intensive.
C
If
we
offload
that
to
the
NIC,
we
get
a
nice
performance
game,
as
I
mentioned,
checksum
offload
is
particularly
ubiquitous.
It
would
probably
be
pretty
hard
to
find
a
NIC
and
on
the
market
today.
That
does
not
support
some
form
of
this
an
interesting
twist.
That's
a
little
bit.
Recent
is
encapsulation.
C
So
what
we
found
is
that
say,
IP
and
IP,
encapsulation
or
particular
udp-based
encapsulations
actually
can
have
multiple
transport
protocols
per
packet
that
contained
their
own
checksum.
So
conceptually
it's
possible
to
have
two
three
four
five
or
six
checksums
in
a
single
packet,
TCP
Texoma
UDP
checksum,
the
GRE
checksum,
it's
all
possible,
so
we
want
to
offload
all
of
those
checksums
and
we
found
some
techniques
that
can
leverage
rudimentary
checksum
offload
of
one
checksum
to
actually
support
multiple
check,
something
even
in
the
same
packet.
C
So
a
little
bit
of
detail,
so
transmitted
checksum,
alsa
offload
has
two
forms.
One
is
protocol.
Specific
one
is
protocol
agnostic,
the
protocol
specific
one.
We
the
host
sends
a
packet
into
the
device,
the
device
actually
parses
a
packet
determines,
if
there's
a
transport
header
and
the
checksum,
and
if
there
is
it,
does
all
the
operations
just
set
the
checksum
so
to
perform
the
ones
complement
checksum
over
the
data.
It
will
compute
the
pseudo
header
checksum
if
there's
one
there,
and
it
will
set
the
checksum
in
the
appropriate
field
of
the
transport
layer.
C
The
more
generic
method
is
for
the
host
to
indicate
in
instructions
exactly
how
to
do
the
checksum.
So
it
provides
two
pieces
of
information
to
the
device.
One
is
where
the
checksum
starts
starting
offset
in
the
packet
and
the
other
one
is
the
offset
to
write
the
checksum,
which
would
typically
be
the
checksum
field
of
TCP,
for
instance,
and
then
the
start
would
be
the
offset
of
the
TCP
header.
C
The
device
gets
this
and
it
will
perform
the
ones
complement
some
starting
from
the
starting
point
to
the
end
of
the
packet,
and
that
sum
whatever
it
gets.
It
basically
adds
it
in
to
the
existing
value
and
the
checksum
field
and
checks
then
sets
the
field.
As
long
as
the
host
set
this
up
and
initialize
a
checksum
field
correctly,
the
device
will
set
this
correct
checksum.
It
has
no
idea
what
kind
of
check
summon
it
is.
It
doesn't
know
if
it's
UDP
or
TCP
it
doesn't
care.
C
It
just
knows
it's
the
standard
internet
packets
exome
for
receive.
We
have
an
analogous
situation.
There
is
a
protocol,
generic
and
a
protocol,
specific
method.
The
protocol
specific
method
is
called
checksum
unnecessary,
as
packets
are
received,
the
NIC
parses
the
packet
determines
if
there
is
a
transport
protocol
that
contains
a
checksum
and
performs
the
work
to
actually
verify
the
checksum.
C
So
it
doesn't
ones
compliment
checksum
computes.
The
pseudo
header
adds
them
checks.
If
the
result
is
checksum
zero,
if
it
is,
the
checksum
has
been
verified
and
sets
a
bit
and
the
receive
descriptor
to
inform
the
host
that
it's
verified
that
checksum
so
again
that
it
is
protocol
specific.
It
only
really
works
with
TCP
and
UDP
packets
that
the
device
explicitly
parses
the
more
generic
method
is
checksum
complete.
C
In
this
case,
the
device
performs
and
ones
complement
some
of
the
whole
packet
starting
from
the
IP
header
through
the
end
of
the
packet,
and
it
simply
returns
that
some
in
the
receive
descriptor
to
the
host
the
host
can
take
that
and
actually
use
it
through
simple
manipulations
of
checksum
to
verify
any
number
of
check
sums
in
the
packet.
So
this
is
really
efficient,
really
a
very
generic
and
is
able,
as
I
said,
to
verify
many
checksum
in
a
packet.
C
So
looking
at
segmentation
offload,
one
of
the
observations
that
we've
made
is
that
networking
stacks
are
more
efficient
when
they
process
large
packets,
as
opposed
to
small
packets,
so
in
particular
per
packet
processing
per
packet
overhead
in
the
stack
is
significant.
More
than
processing
the
data
bytes,
usually
so
we
want
to
see
if
we
can
arrange
the
system,
so
we
can
process
large
packets
instead
of
small
packets.
C
So
there's
two
forms
of
this
is
one
on
transmit
and
one
receive
on
transmit
segmentation
offload.
The
idea
is,
the
host
produces
a
large
packet,
say,
a
64k
TCP
segment,
and
we
want
to
break
this
packet
up
into
smaller
chunks
for
sending
out
into
the
network,
which
may
have
say
a
1,500
byte
m2u.
So
we
want
to
do
this
as
low
as
possible.
So
the
idea
is
the
stack
process.
C
Is
the
big
packet
processes,
one
IP
header,
one
TCP
header
and
at
the
lowest
point
possible
either
in
the
software
or
even
in
the
network
device,
there's
a
type
of
segmentation
or
fragmentation.
So
we
slice
up
the
data,
give
each
packet
its
own
IP
header,
own
TCP
header
and
send
each
one.
So
there
is
a
software
variant
and
Hardware
variant
of
this
software
variant
is
called
GSO
segmentation
offload.
The
hardware
variant
is
LSO
large
segmentation
offload.
You
might
see
it
also
called
TSO
TCP
segmentation
offload
with
this,
when
this
is
specific
to
TCP
receive
segmentation.
C
Offload
is
the
opposite.
So,
when
small
packets
are
received,
we
try
to
coalesce
these
into
larger
segments
and
larger
packets.
So
again,
this
is
per
flow,
similar
operation,
and
there
are
two
variants
of
this
one
of
the
software.
One
is
a
hardware
the
software
is
generic,
receive
offload
gr
o
the
hardware
is
lro
larger,
steve
offload.
This
particular
offload
of
all
the
checks
are
all
the
basic
offload,
there's
probably
the
hardest
one.
C
It
does
require
the
network
device
to
be
able
to
parse
the
packet
and
understand
a
lot
of
details
of
the
protocol
so,
for
instance,
the
implementation
that
do
this
really
only
understand
TCP,
usually
some
of
that
encapsulation,
but
until
we
have
say
a
fully
programmable
environment,
it
is
hard
to
generalize
this
one.
One
thing
I'd
like
to
mention
about
segmentation
offload:
this
really
only
works
in
conjunction
with
checksum
offload.
C
C
One
of
the
interesting
properties
is
that
once
we
have
queues,
we
can
assign
properties
to
them,
particularly
in
transmit
each
queue
can
have
its
own
attributes
so,
for
instance,
of
a
kind
of
high
priority
queues
and
low
priority
kids.
One
of
the
important
aspects
when
we
deal
with
multi
cue,
we
do
want
to
try
to
keep
packets
in
order.
So,
for
instance,
we
don't
want
to
be
distributed,
packets
in
the
same
flow
across
different
queues
either
and
transmit
or
receive.
C
So
there
are
some
techniques
in
the
model
of
queueing
to
try
to
enable
the
inorder
delivery
as
much
as
possible,
so
on
transmit.
There
are
essentially
two
methods
to
do
this.
One
is
the
easy
method
which
is
fundamentally
each
CPU
is
assigned
to
a
queue.
So
when
an
application
is
sending
a
packet,
for
instance,
the
queue
chosen
is
the
one
associated
with
that
CPU.
The
applications
running
on,
and
the
advantage
of
this
is
that
we
get
this
sort
of
siloing
locality.
C
For
instance,
when
a
packet
is
sent
on
a
queue,
we
have
to
lock
the
queue
in
order
to
manipulate
the
queue
pointer.
If
we
do
this
in
CPU
per
queue,
then
there's
no
contention
for
the
lock
and
no
contention
for
the
structures
of
the
queue.
The
second
method
is
when
the
driver
selects
the
queue.
So
as
I
mentioned,
queues
can
have
some
rich
semantics,
such
as
priority.
What
we've
done
there
instead
trying
to
expose
all
possible
combinations
of
this?
C
C
So,
for
instance,
if
we're
sending
a
high
priority
packet,
where
the
metadata
associated
with
the
packet
said
this
high
priority
when
this
goes
into
the
driver,
it
looks
up
the
queue
that's
appropriate
for
that,
so
they
may
have
a
CPU
to
queue,
affinity,
priority,
there's
also
other
attributes.
You
could
apply
like
rate
limiting
on
the
receive
side.
This
is
normally
called
packet
steering.
So
the
idea
is
when
packets
come
in
to
the
NIC,
they
need
to
be
distributed
amongst
the
queues
it
turns
out.
C
This
is
a
lot
like
a
CMP,
and
some
of
the
techniques
are
very
similar
where
we're
trying
to
distribute
any
CMP
to
multiple
interfaces
on
the
state.
Listening
stateless
side,
there
are
two
forms
of
this:
one
is
called
received.
Packets
tearing
that's
a
software
variant,
RSS
received
site
scaling
is
the
hardware
variant.
They
both
essentially
work.
This
the
same
when
packets
come
in
a
hash
is
performed
over.
C
The
five
tuple
of
the
packet
of
the
transport
layer
is
available
or
three
tuple
for
using
the
flow
label,
but
the
effect
is
to
identify
the
flow
by
a
hash.
Take
that
hash
and
map
that
into
one
of
the
queues
and
that
ways
we're
also
consistent
so
for
this
particular
flow,
it
always
has
the
same
hash.
Therefore,
we
can
always
map
that
to
the
same
queue
in
order
to
facilitate
in
order
delivery.
C
An
extension
of
this
is
something
called
receive
flow,
steering
in
this
case
the
host
itself
can
actually
sort
of
program
for
each
flow
which
queue
to
use.
This
is
very
powerful
mechanism,
so
on
a
per
flow
basis.
The
host
can
indicate
okay
for
this
flow
use.
This
queue.
There
are
two
variants
of
this.
Also
there
is
a
software
variant
and
hardware
variant.
The
advantage
of
this
is
to
get
a
really
good
isolation.
C
Some
people
use
this
where
they
pin
an
application
to
a
CPU
where
that
application
only
runs
on
that
CPU
and
they
associate
a
network
queue
with
that
application
and
receive
flow
steering
connects
the
arrangement
so
that
packets
only
for
that
applications
flows
go
to
that
queue.
So
it's
very
siloed.
The
application
acts
like
it's
the
the
only
application
on
the
system.
We
get
a
lot
of
performance
gains
that
way
so
with
that,
I
will
turn
it
over
to
Simon.
Who
will
talk
about
some
of
the
more
advanced
offloads.
D
Thanks
Tom
so
so
far,
Tom
has
taken
us
through
some
basic
offloads
and
the
basic
functionality
of
the
NIC
itself.
Well,
as
the
use
cases,
the
demands
of
the
users
evolve
and
the
hardware
evolves.
At
the
same
time,
it
only
makes
sense
that
more
and
more
processing
could
be
pushed
down
to
the
hardware,
and
so
in
this
section,
we'll
look
at
at
examples
of
that
in
terms
of
offloading
more
of
the
data
plane
world
the
packet
processing,
but
before
I
get
into
some
examples
in
that
area.
D
I
just
like
to
quickly
cover
some
of
the
hardware
solutions
that
might
be
used
in
in
this
kind
of
area.
So
it's
important
to
note
that
these
solutions
it's
a
little
bit
of
a
mix-and-match.
It
depends
very
much
on
the
use
case
which
choice
is
appropriate
and
some
hardware
choices
match
some
use
cases
more
naturally
than
others,
but
at
the
same
time
they're
not
necessarily
mutually
exclusive.
D
So
so
far,
the
next
we've
talked
about
a
fall
in
the
first
category,
where
you
have
a
fixed
data
plane,
and
so
this
will
become
kind
of
a
sick
that
implements
a
pipeline
in
hardware
and
we
can
also
use
more
programmable
technologies
and
this
kind
of
fall
into
three
sub
categories.
We
have
semi
specialized
processes,
called
network
flow
process
or
NPU
network
processing
unit,
and
so
in
this
it's
a
little
bit
similar
to
a
general
process,
a
purpose
processor
like
a
CPU
on
a
in
a
server.
D
They
differ
from
a
general
purpose.
Cpu
is
that
there
are
a
little
bit
more
specialized,
so
they
might
have
instructions
to
do
network
related
functionality
or
or
they
might
have
much
highest
thread
density
things
along
these
lines
to
make
them
more
suited
to
network
processing,
and
then
you
have
FPGA,
which
is
probably
the
most
programmable
solution
possible.
Here
we
have
gate
level
programming.
D
Here
we
have
a
diagram
that
represents
roughly
how
this
works,
so
we
have
applications
and
then
in
the
corner
we
have
a
implementation
of
a
data
path
and
then
down
in
the
in
the
offload
Nick.
We
have
a
data
plane
which
implements
some
wall
or
maybe
all
of
the
functionality
of
the
data
paths
in
the
in
the
kernel,
and
so
this
is
able
to
afford,
for,
for
example,
four
packets
around
and
so
on.
D
D
So
the
first
step
is
that
we
do
some
kind
of
header
extraction,
so
we
pull
out
some
fields,
for
example
the
five
tuple,
but
is
also.
We
also
have
metadata,
for
example,
the
port
that
the
packet
arrived
in.
Other
things
can
also
be
available.
Then,
using
this
data
we
typically
do
a
hash,
and
then
there
has
looking
up
in
the
hash
table.
D
We
try
to
find
a
match,
and
if
we
do
find
a
match,
then
the
match
will
supply
some
kind
of
action
that
should
be
executed
or
our
list
of
actions
even
and
so
this
could
be
to
forward
to
a
different
port.
It
could
be
to
drop,
it
could
be
to
move
on
another
table.
If
you
have
multiple
tables
present,
it
could
be
to
do
some
kind
of
modification
of
the
packet.
D
D
You
can
also
extract
the
source
and
destination
IP
addresses
from
l3,
and
then
you
can
also
select,
for
example,
the
ports
that
layer
for
this.
So
you
can
create
a
specific
role,
for
example,
do
some
kind
of
special
treatment
on
port
80
traffic,
possibly
to
a
separate
host
it's
fairly
flexible.
In
this
regard,.
D
D
The
interesting
thing
about
this
use
case
is
that
there's
no
queue
available,
so
the
actions
that
can
be
applied
fairly
limited.
We
can
release
the
packet,
perhaps
by
dropping
it
or
marking
it,
we
can
filter
it
and
so
on.
Egress
is
a
little
bit
more
interesting.
A
little
bit
more
complex,
perhaps
is
a
better
way
to
put
it
because
we
have
a
queue.
So
we
have
the
option
of
doing
a
much
larger
number
of
different
things
with
the
packets.
D
In
order
to,
for
example,
enforcer
decide
packet
rate,
we
can
delay
packets,
we
can
of
course,
drop
them
and
so
on,
and
this
is
an
area
of
which
has
received
significant
research
over
the
years,
and
most
of
this
research
is
applicable
there.
Of
course,
a
challenges
in
implementing
individual
algorithms
on
an
offload
NIC,
as
opposed
to
a
hoster,
is
usually
a
more
limited
execution
environment,
but
nonetheless,
the
same
principles
generally
apply.
D
Now,
in
this
diagram,
we
have
packets
coming
into
the
machine
into
the
neck
and
also
exiting
the
neck,
so
they're
being
folded
from
one
port
to
another
of
the
NIC
that
could
be
a
virtual
port
or
a
physical
port,
and
the
NIC
is
applying
some
kind
of
QoS
as
they
traverse
the
neck
in
the
next
slide.
In
this
slide,
we
have
a
slightly
different
setup,
so
here
we
have
packets.
D
Of
course,
it's
two
directional,
but
I
just
will
only
talk
about
one
direction,
which
is
packets,
originating
from
an
application
running
on
the
host
and
heading
out
towards
the
wire
out
of
physical
port
of
the
neck,
and
the
neck
is
applying
some
kind
of
QoS
policy
to
those
packets
as
they
traverse
the
neck.
This
is
so
in
this
particular
case.
D
So,
moving
on
to
the
last
part
of
my
section,
I
talked
about
crypto
offload
a
little
bit,
so
this
is
a
little
bit
different
to
what
I've
talked
about
so
far
with
the
data
plane.
A
processing
package
in
the
sense
that
what
we're
really
focusing
now
is
offloading
from
the
host
a
very
computationally
expensive
part
of
packet
forwarding
if
you
are
applying
crypto
and
crypto
itself,
tends
to
be
quite
complex.
D
D
Essentially,
what
the
hosts
will
do
is
to
format
a
TLS
frame,
but
it
does
not
perform
the
cryptographic
operations
so
there,
the
authorization
hatch
this
space
for
it,
but
it's
not
filled
in
and
the
packet
or
the
the
record
payload
is
in
plain
text
and
then
the
offload
Nick
will
receive
this
record
and
perform
the
cryptographic
operations.
So
it
turns
to
plain
text
into
cipher
text
and
it
tends
to
fills
in
the
hash
on
rx.
Things
are
reversed.
D
D
D
Ipsec
acceleration
floor
follows
a
similar
principle
to
the
TLS,
in
the
sense
that
some
parts
are
offloaded
and
some
parts
are
not,
and
at
this
time
we
have
two
models
for
this.
One
is
the
crypto
offer
load,
which
is
very
similar
to
what
I
described
to
two
TLS,
in
a
sense
that
it
is
the
hosts
responsibility
to
add
the
IPSec
headers
to
the
packet,
but
it
does
not
perform
the
cryptographic
operations
which
are
left
to
the
card.
D
It's
worth
noticing
at
this
point
that,
on
the
one
hand,
in
did
this
combines
a
number
of
different
offload
switch,
which
we've
already
discussed
the
LSO,
the
segmentation
offload
and
the
checksum
offload.
So
one
if
one
is
offloading
the
crypto
one
also
needs
to
upload
those
operations
there.
Conversely,
with
IPSec
traffic,
one
cannot
offload
the
segmentation
offload,
all
the
checksum
offload,
if
one
does
not
also
offload
the
cryptographic,
so
there's
significant
benefits
to
being
able
to
build
this
stack,
but
in
a
sense
it's
an
evolution.
D
Well,
one
could
not
build
this
particular
piece
of
technology
without
other
pieces
that
have
come
earlier
that
with
the
ones
that
Tom
spoke
about
the
other
model.
We
have
is
a
fuller
float
so
by
full
of
load.
What
you
mean
here
is
that
cutter
is
responsible
for
adding
the
IPSec
headers
on
transmit
and,
of
course,
we're
moving
them
on
receive.
D
D
D
E
Alright,
so
you've
already
heard
a
pretty
long
discussion
about
how
this,
how
these
things
all
works.
That's
good,
I,
appreciate
everyone,
who's
still
awake
and
finished,
checking
all
their
email.
No
and
I
will
talk
about
it
without
programmability.
So
you
know
what
Simon
and
Tom
talked
about.
Really
all
these
offload
features
that
were
enabled
exclusively
by
hardware
providers
or
hardware
vendors
who
feel
like
this
is
something
useful,
probably
from
feedback
based
on
users.
E
Maybe
not
it
sort
of
depends,
but
we're
gonna,
we're
gonna,
build
on
that
and
talk
about
how
it's
sort
of
the
next
evolution
in
this
in
this
path
is
fully
programmable
mix.
So,
as
tom
talked
about,
those
are,
could
be
good,
probably
good,
but
really
there's
a
couple.
Key
features
I
want
to
highlight
and
think
about,
and
why
programmability
of
a
Nick
would
matter
so
right
out
of
the
gate.
E
I
think
one
of
the
really
important
things
is
that
it
facilitates
really
a
rapid
protocol
development,
so
we're
kind
of
in
a
phase
right
now
where
fixed-function
offload
is
so
powerful
and
so
useful
that
if
you,
if
you
want
to
deploy
a
new
protocol-
or
you
think
you
want
to
help
develop
a
new
protocol
and
you
want
to
rapidly
iterate
that
one
of
the
problems
you
find
you're
getting
yourself
into
is
that
well.
Are
we
gonna
really?
Our
current
infrastructure
are
really
gonna
burn,
more
cores
processing
packets,
just
to
support
this
new
protocol.
E
There's
this
notion
right
that
that,
if
you,
if
you
run
a
large
or
small
scale
data
center,
there's
there
is
going
to
be
some
magic
packet.
That's
going
to
melt
your
network,
and
this
would
give
you
that
opportunity
to
snuff
that
out
and
hardware
before
it
gets
too
far.
So
so
today,
in
the
programmable
NIC
world,
there's
really
two
two
sort
of
main
types:
one
is
special
purpose
hardware
or
FPGA
and
P
use
that
Simon
is
referenced
before
so.
E
This
is
something
that
we're
gonna
program,
very
specific
hardware,
we're
gonna,
write
code
for
and
then
the
other
one
is
really
a
new
class
of
NICs
that
have
appeared
in
the
last
couple
years.
That
really
just
contain
a
general-purpose
processor.
So
this
might
be
an
arm
and
x86
MIPS,
maybe
in
the
future
like
RISC
five,
but
really
just
something-something
general-purpose,
they
can
run
any
code
so
and
and
I
think
really.
E
While
this
might
seem
today
like
something
that
isn't
exactly
what
what
you
might
want
sort
of
looked
at,
some
of
the
forwarding
plane,
realities,
slides
from
the
last
IETF
and
I.
Think
there's
a
really
interesting
quote
at
the
conclusion.
At
the
end
of
the
that,
what's
what's
niche
today
can
be
broad
tomorrow
and
I.
Think
that's
generally
speaking,
what
we've
seen
across
the
board
in
networking
and
in
NICs
that
there'll
be
someone
that'll
roll
out
a
new
feature
and
someone
will
think
I
don't
know
before
long
everybody's
got
it
and
everybody
wants
it.
E
E
There
will
be
cases
where
a
software
data
path
does
not
exist
in
the
kernel,
for
whatever
feature
you're
adding.
Now,
that's
it's
a
little
bit
different
from
what
we
do
in
the
Linux
community,
where,
if
there's
Hardware
offload
capabilities
that
are
there
and
your
hardware,
there's
sort
of
an
insistence
that
there's
a
software
fallback
data
path
that
exists
and
within
Linux,
that's
been
extremely
helpful
and
we're
gonna
continue
I
think
to
push
that.
But
this
is
a
case
where
that
might
not
be
the
case.
E
You
may
just
have
a
data
path,
that's
completely
done
in
the
kernel,
with
no
software
fallback
at
your
own
risk,
I,
guess
and
and
in
fact,
that
data
plane
could
be
expressed
in
a
variety
of
languages.
So
you
know
maybe
p4e
BPF
and
PL,
or
maybe
just
a
native
instruction
set
for
for
that
that
NP
you
as
Simon
talked
about
many
MP
use,
have
has
something
that
maybe
have
special
instructions
for
performing
operations,
and
the
key
that
we
talked
about
too,
is
that
this
is.
This
is
dynamically
programmed?
E
So
in
this
you
know
this
death
packet
that
could
exist.
You
can
roll
out
new
code
quickly
or,
if
you're,
rapidly,
developing
a
new
protocol,
and
you
start
to
say
you
know
what
maybe
I
don't
need
350
bytes
of
header
to
describe
this.
This
new
protocol,
maybe
we'll
make
it
a
little
shorter,
like
324
or
something
who
knows
so,
they'll
keep
the
other.
E
The
other
piece
is
a
really
a
general-purpose,
and
so
this
is
a
little
bit
of
a
unique
situation,
a
little
different
than
we've
had
in
the
past,
but
it's
becoming
pretty
popular,
and
so
this
is
a
case
where
we're
moving
the
entire
host
networking
stack
down
on
to
the
NIC.
So
and
yes,
I
said
that
right.
So
what
that
actually
means
is
your
NIC
could
actually
run
another
copy
of
an
operating
system.
E
Some
people
shudder
at
this
thought,
because
maybe
it
sounds
a
little
more
complex,
but
the
fact
is,
if
you
have
this
already
implemented
software
on
your
server,
you
could
actually
move
it
down
to
your
NIC
and
free
up
the
server
course
from
doing
that
work.
So,
in
this
case,
the
data
playing
offload
is
down
on
this
general-purpose
processor,
as
I
mentioned,
and
also
the
control
plane.
So
now
what
if
your
routing
daemon
was
running
on
the
NIC
or
what?
E
If
your,
whatever
was
receiving,
you
know,
OpenFlow
messages
from
a
controller
was
running
completely
on
the
NIC.
So
now
you've,
you
found
yourself
consuming
zero
host
resources.
Server
host,
not
NIC
host.
You
know
sort
of
different
CPU
complexes.
There
are
not
sort
of
actually
different
CPU
complexes,
so
now
you're
not
consuming
any
of
the
resources
of
your
server,
and
you
can
free
them
up
for
doing
useful
things
whatever
those
may
be.
So
this
control
plane
offload
is
also
really
nice.
E
If
you
have
what
some
are
calling
now,
a
bare
metal
deployment
where
you're
really
you're
you're
setting
up
servers,
you
don't
know
exactly
what
they're
going
to
be
used
for,
but
you're
responsible
for
networking.
You
can
feel
pretty
confident
that
there's
a
good
chance
that
your
server
administrators
are
not
going
to
ruin
whatever
network
setup.
You
want
them
to
have
pretty
confident,
also
in
the
multi
tenant
deployments.
This
would
be
really
good.
E
You
can
can
make
sure
that
that
no
one,
no
one
person
has
has
a
chance
to
destroy
too
much
and
it
really
it
brings
a
lot
of
the
server
networking
administration
back
into
the
purview
of
the
network.
Admin
I
think
that's
a
sort
of
a
constant
struggle
between
those
two
groups
somewhat
understandably
so
this
this
gives
networking
networking
tentacles
to
get
a
little
bit
further
into
the
server.
If
you
will
so
kind
of
in
the
same
vein,
here's
that
picture
again.
E
So
this
this
would
mean
that,
obviously,
if
you
have
applications
that
are
running
in
your
server,
they're
still
going
to
get
the
data
that
they
need,
but
you're
not
spending
your
time
just
needlessly
moving
packets
between
between
different
applications,
whatever
those
look
like
and
and
the
reality
to
here-
and
it
doesn't
get
any
more
recursive
than
this
I
promise-
is
that
the
programmable
mix
also
have
offload
capable
devices.
These
things
are
all
being
put
on.
The
same
die
so
you
all
have
a
control
and
a
data
plane
and
a
fixed
function
device.
E
That's
all
embedded
down
but,
like
I,
said,
I
promise
that
that
offloaded
data
path
on
the
fixed
function
device
doesn't
also
contain
another
general-purpose
processor
and
another
one
on
down.
It's
just
as
simple
I
appreciate
that
it's
just
just
as
simple
the
simple
fact
is:
we're
building
these
chips
that
are
pretty
large
and
have
both
the
general-purpose.
You
know-
maybe
you
know
maybe
armor
MIPS
cores
on
the
side
with
with
a
fixed
function
ASIC,
but
there
are
also
people
building
building
NICs
that
in
addition
to
that,
have
efi
JS
RMP
use
as
well.
E
So
so
I
think
this
is
kind
of
a
new
world
in
a
lot
of
ways.
I
think
there's
not
a
lot
of
not
a
large
number
of
users
that
are
doing
this,
but
I
think
this
is
a
strong
case,
especially
in
a
place
like
this,
where
we're
seeing
rapid
protocol
development,
where
the
programmable
NIC
is
an
extremely
powerful
option
and
an
extremely
interesting
going
forward,
so
I
think
really.
The
way.
To
summarize
this
is
that
we
think
about
the
networking
trends
going
forward.
E
That
was
a
joke,
but
it's
not
and
I
wish
it
was
passengers,
but
anyway
and
I
think
there's
we're
seeing
more
and
more
to
that
there's
an
interest
in
deploying
new
protocols.
We
I
regularly
hear
requests
for
things
that
you
know.
We
wonder
how
we
can
make
the
hardware
that
is
fixed-function
support
and
how
long
it
will
take
to
maybe
support
that.
So
this
this
gives
a
new
option
for
people
who
want
to
want
to
do
those
things
quickly
and
I.
Think
that
that
the
Knicks
are
gonna
work
together
with
host
operating
systems.
E
To
make
these
things
happen,
we
don't
see
offloads
going
away.
We
see
offloads
becoming
more
powerful
and
and
becoming
more
flexible
and
and
continue
to
be
important.
So
and
I
also
think
that
the
program
ability
and
this
flexibility
will
really
spur
innovation
that
that
we
haven't
thought
of
before
I.
Think
that's
the
magical
part
about
about
some
of
these
devices
that
that
are
completely
or
not
completely
fairly
flexible.
E
B
A
F
G
E
F
C
It
answered
the
question:
I
would
point
out
that
some
of
the
earlier
work
actually
came
out
of
windows.
For
instance,
RSS
was
literally
invented,
I
think
it
was
n
des
described
that
and
I
believe
they
had
the
early
checksum
offload
and
I
think
what
happens
is
as
Linux
became
kind
of
more
popular
in
open-source.
We
had
a
lot
of
developers
that
are
working
on
that
and
at
some
point,
the
NIC
vendors
as
the
volumes
go
up,
they
start
to
pay
attention.
That
being
said,
we
do
know
that
FreeBSD
may
use.
C
Some
of
these
I
know
that
some
of
the
work
that
we
did
and
the
packets
during
was
being
applied,
and
that's
a
good
thing
so,
like
I
said
in
my
talk,
we
do
want
common
API
is
across
OS
as
but,
most
importantly,
there's
nothing
I,
don't
think,
there's
anything
we're
doing
in
the
Nick.
That
would
be
specific
to
Linux
or
any
particular
OS.
In
fact,
I
think
some
of
these
techniques
would
even
be
applied
in
something
like
DP,
DK
or
kernel
bypass,
so
again
we're
just
using
Linux
as
a
reference.
C
H
I
Okay,
but
let's
go
independent
in
this
sense,
you've
talked
a
lot
about
different
features
on
different
cards
and
all
the
rest
of
it.
When
you're
writing
the
code,
you've
got
to
know
what
the
card
can
do.
That's
on
the
machine.
The
your
code
happens
to
be
running
on
so
I,
don't
I,
don't
we
need
to
sort
of
explain
all
about
the
api's
now,
but
it's
really
the
questions
more
about
what
up
from
what
I've
seen
is
going
on.
I
Essentially
at
someone
writes
a
page
to
say
you
know,
what's
the
consensus
on
what
people
do
say
for
offload
and
does
that
need
Standardization?
Is
it
working
at
the
moment
just
having
it
done
ad
hoc?
Would
it
screw
it
up
if
it
was
standardized
I'm,
just
thinking
it
seems
to
be
all
very
ad
hoc
at
the
moment
how
the
description
of
what's
the
hardware
is
capable
of
so
that
you
can
write
a
code
to
know
what
to
use.
E
Well,
it
is,
it
does
feel
a
little
bit
ad-hoc,
though
especially
I,
think
from
the
outside,
because
what
would
probably
burnt
outsides
the
wrong
word
to
use
it
as
an
observer.
It
probably
might
feel
ad-hoc,
because
you
just
see
patches,
show
up
and
support
exists
and
usually
what
happens
is
one
vendor
will
come
up
with
it?
Another
one
will
say:
oh
yeah
me
too,
and
then
they'll
do
it
and
maybe
enhance
it
a
little
bit
more
but
I
think
that's
a
the
goal
is
to
have.
E
I
J
J
C
So
if
you
think
about
let's
look
at
large
segmentation
offload
so
in
the
NIC,
this
is
splitting
a
packet
up
into
individual
TCP
segments.
Each
TCP
header
has
its
own
checksum,
so
I
need
to
actually
after
I
do
the
segmentation.
Then
I
need
to
set
the
checksum.
It
has
to
be
per
packet,
and
this
is
actually
one
of
the
trickier
things
with
something
like
segmentation
offload,
the
fewer
things
I
have
to
do
per
packet,
the
better.
If
it's
the
case
where
I
could
just
copy
all
of
the
headers
to
each
segment.
C
That's
a
lot
easier,
but
each
time
we
have
to
consider
like
IP
ID
is
another
good
example
in
the
IP
header.
But
packet
lengths
are
always
interesting
and
check
sums
the
hardest
one
so
anytime
they
have
to
set
something
that
is
unique
for
that
packet.
I
have
to
do
that
in
the
neck
and
checksum
offload
is
definitely
one
of
us.
J
And
for
receive,
you
have
to
do
it
because
you
have
to
check
the
individual
check
sums.
Otherwise,
the
you,
you
might
end
up
returning
a
corrupt,
bigger
packet
to
the
stack
in
terms
of
capabilities
you
know
for
receive.
You
also
need
checks
on
the
flow
food
to
receive
offload
yeah.
Ok,
the
other
question
I
had
was:
do
you?
Is
there
somewhere,
I
mean
it
so.
The
earlier
questions
that
have
essentially
said
there's
a
cabal
of
you
know
ten
people
in
the
world
who
actually
know
how
to
do
this.
J
C
A
C
So
the
question
was
about
path,
MTU
and
I.
Suppose
a
segmentation
off
flirt,
so
it
does
matter
and
in
fact,
when
we're
doing
something
like
LSO
TSO,
we
aren't
just
chunking
up
packets
per
the
MTU.
We
want
to
abide
by
the
path
em
to
you,
so
the
way
it
works
is
the
hoe
stack
actually
tells
what
the
size
is
of
the
packets
to
go
out.
So
we
can
abide
by
path.
Mtu.
C
One
of
the
interesting
things
that
we
try
to
do
is
when
we're
sending
LSO
try
to
keep
the
packets
the
same
size,
except
for
the
last
one.
That
simplifies
the
problem
that
we
just
talked
about
with
Lorenzo,
where
we
have
to
set
the
length
for
each
packet
easiest
way
to
do.
That
is
to
kind
of
infer
what
the
lengths
are.
So
we
tell
the
NIC.
C
This
is
the
the
length
yet
the
maximum
length
make
all
the
packets
the
same
size
except
for
the
last
one,
which
could
be
short
and
then
that
way,
we
can
accommodate
paths
em
to
you
so
in
terms
of
larger
em
to
use
in
the
data
center.
We're
seeing
like
9
km
to
use
with
jumbo
frames,
that's
actually
a
little
less
pertinent
to
LRO
and
elet
and
Ella.
So
in
that
case,
we're
actually
just
using
the
the
native
MTU
to
accomplish
the
larger
packet
size.
K
Rodin's
fresh
chicken
elapsed,
I
was
wondering
about
the
crypto
offloading
that
was
sort
of
in
the
middle
of
the
presentation.
That
sounds
very
interesting
but
I'm.
What
I'm
wondering
about
is
to
what
extent
does
that
repeat
the
risks
of
all
of
these
vulnerabilities,
such
as
padding
Oracle's
and
all
of
that
and
repeat
that
in
the
nick
implementations,
is
that
is
there
any
information
about
that
is?
Are
there
experiences
with
that
all
of
the
stuff
that
got
sold
in
crypto
stacks
that
are
just
on
the
normal
CPU.
K
And
the
neck
now
the
question
is
I
mean
there
are
all
these
vulnerabilities
if
you
do
crypto
implementation,
like
timing
attacks,
see
things
like
padding,
Oracle's
specific
to
symmetric
implementations.
To
what
extent
are
these?
What
is
the
risk
of
these
get
repeated
in
the
NIC
implementations
and
if
they
are
a
unique?
How
do
you
fix
that.
D
Yes,
although,
as
I
understand
the
question
is,
if
we
look
at
crypto
there's
a
wide
variety
of
attack,
vectors
varying
complexity
and
any
individual
implementation
might
be
suffering
from
any
number
of
these.
So
if
we
push
an
encrypted
implementation
down
to
the
hardware,
what
is
what
kind
of
problems
might
we
see
there
in
this
area?
Yeah,
so
I
think
that
that's
a
good
point,
and
certainly
we
can't
pretend
that
there
is
not
going
to
be
any
problems.
I
think
that
as
the
complexity
of
what
you're
sorry
as
the
complexity
of
what's
being
offloaded
increases.
D
So,
for
example,
if
we
use
move
from
a
crypto
only
offload
towards
a
full
offload,
then
I
the
surface
for
these
kind
of
problems
must
surely
exist.
In
my
mind,
I'm
not
really
sure
what
the
best
way
to
move
forwards
on
this,
certainly
that
the
vendors
or
the
supplies
of
the
code
or
ideally
open
code
we'd
need
to
move
rapidly.
But
perhaps
we
also
need
to
have
some
kind
of
mitigations
in
the
system.
D
D
I
I
didn't
quite
catch
that
but
I
guess.
The
question
is:
what
would
be
the
mechanism
to
fix
it?
I
I
think
the
it
would
depend
on
the
implementation.
I
mean
if
it's
a
if
it's
a
kind
of
a
fixed
device
and
you're
receiving
firmware
from
the
vendor,
then
I
suppose
the
main
avenue
other
the
mitigations
would
be
to
get
an
updated
firmware.
D
A
L
Wait
a
minute:
it's
a
simple
question:
okay,
quick!
What
we
see
exactly
what
we
see
is
that
there
is
a
trend
towards
moving
protocol
implementations
to
the
application
space
for
various
reasons,
and
we
see
that
with
quake
in
particular.
What
I've
seen
in
your
presentation
is
that
the
interfaces
that
are
shown
are
food
camera.
L
E
Think
within
the
Linux
kernel
there
is
a
little
bit
of
that.
There
is
actually
there
was
a
presentation
done
last
year
in
Prague
at
the
net
type
conference,
about
actually
offloading
quick
and
and
what
could
be
done.
What
kind
of
kernel
interfaces
are
needed
in
order
to
to
make
that
possible?
So
I
think
that
I
think
the
move
to
protocol
implementations
like
that
in
user
space
is
may
be
a
result
of
hardware
in
flexibility.