►
From YouTube: IETF105-TDD-20190723-0830
Description
Technology Deep Dive session at IETF105
2019/07/23 0830
https://datatracker.ietf.org/meeting/105/proceedings/
A
B
A
E
D
D
G
Would
say
my
name:
do
you
want
to
say
your
name
is
a
chick
so
good
morning,
everybody
sorry
for
the
slight
delay,
but
now
we're
also
online
that's
great
good
morning.
This
is
the
deep
dive
session.
That's
kind
of
the
second
dipped
ice
type
session.
We
are
having
at
ITF
last
meeting.
We
had
a
deep
type
session
on
router
architectures,
which
people
gave
us
a
lot
of
positive
feedback
about.
So
we
thought
we'd.
G
G
G
D
D
Okay,
so
when
we
were
scoping,
this
talk
an
hour
and
a
half
didn't
seem
to
do
justice
to
the
content.
So
we
had
to
limit
the
scope.
We
could
have
a
half-day
discussion
on
not
a
tutorial,
but
just
high
level
topics,
so
they.
What
is
in
scope
is
we
will
talk
about
basic
Nick's
about
how
basic
Nick
works
will
proceed
to
medium
range
offload
from
the
host
stack
to
the
cut
to
the
hardware
and
slightly
more
advanced
features,
we're
going
to
use
Linux
kernel
as
a
reference
point,
not
necessarily
the
only
way.
D
The
only
operating
system
that
does
this
once
out
of
scope.
Is
we're
not
going
to
talk
about
kernel
bypass,
so
not
the
PDK
discussions.
We're
not
going
to
talk
about
small
CPE
devices
that
use
the
same
API
on
Linux,
at
least
or
very
large
Essex
multi,
terrible
Essex,
which
may
use
the
same
api's
in
some
vendors
Essex
we're
not
going
to
talk
about
virtualization
offload
technologies,
s,
RI,
o
VV,
m
DQ
and
any
newer
schemes
are
out
of
topic
and
storage
is
also
out
of
topic.
So
this
could
be
another
session
in
the
future.
D
D
Features
or
forwarding
functions
so
NYX
can
process
a
lot
accelerate
as
well
a
lot
of
and
have
a
lot
of
helpers
in
the
hardware
for
TCP
UDP
quake
TLS
IPSec.
A
lot
of
the
nvo
3
is
mostly
commodity
offloading.
At
this
point
in
time,
you
can
accelerate
any
of
the
layer,
2
to
layer,
n,
forwarding
and
filtering
a
lot
of
QoS
offloading
it's
a
very
condensed
session.
So
what
we'll
ask
is
you
can
will
only
allow
for
clarification,
questions
and
any
other
questions
that
you
may
have
will
come
at
the
end.
D
I'm
gonna
introduce
the
presenters.
You
have
a
very
competent
set
of
folks
here
on.
The
left
is
Tom
Hubbard
from
Intel
and
the
Gospel
Derek
from
Broadcom
and
Simon
hormone
from
metronome
and
I'd
like
to
acknowledge
Boris
where's
Boris
from
Mellanox.
These
are
very
competent
folks.
They
have.
They
know
how
the
implementations
work.
They
understand
the
hardware
very
well
so
you're
in
good
hands.
D
Having
said
that,
these
slides
took
a
lot
of
community
effort
from
the
native
community
in
general,
and
this
is
a
list
of
people
who,
in
one
way
or
another,
contributed
shaped
opinionated
on
what
should
be
cut
out.
What
you
became,
how
the
slides
should
be
structured,
etc,
and
with
that
I'm
gonna
hand
it
over
to
the
first
speaker,
Tom.
H
How's
that
yeah
no
much
better,
okay,
so
I'm
going
to
present
the
fundamentals
and
basic
offloads
Nick's,
so
a
few
definitions
might
be
useful
on
Nick
is
a
network
interface
card,
sometimes
network
interface
controller.
This
is
the
host
interface
physical
interface
to
physical
Network.
Host
ACK
is
the
software
that
processes
packets
and
does
protocol
processing
in
the
host.
H
Typically,
this
is
layer,
2,
layer,
3,
layer,
4
processing,
a
kernel
stack
is
simply
a
host
stack
that
runs
inside
a
kernel
and
Esther
mell
mentioned,
for
the
most
part
will
be
referencing
Linux,
for
that
offload
is
when
we
do
something
inside
the
NIC
on
behalf
of
the
host.
So
this
is
work
that
we
move
essentially
from
the
host
to
the
NIC
for
some
purpose,
work
that
involves
a
networking
and
acceleration
is
offload
that
is
done
mostly
for
performance
gains.
So
what
is
a
network
interface
card?
H
This
shows
a
picture
on
the
left
of
a
card,
and
most
of
you
should
be
familiar
with.
These
whoever's
had
a
PC,
for
instance,
and
that's
not
a
plug
these
in
so
they
go
into
the
system.
Bus
I
would
point
out
this
particular
card.
Very
ancient,
actually
it
has
a
BNC
connector.
So
this
is
true
Ethernet
and
ISO
connectivity,
but
nevertheless
it's
a
NIC
and
modern-day
NICs,
obviously
look
a
little
bit
different,
but
basically
perform
the
same
function.
So
we
NIC
is
the
receiver
and
transmitter
of
packets
to
the
network
to
the
physical
network.
H
It's
the
device
that
does
that
and
on
the
right
we
have
a
stack
and
you
can
see
that
in
the
protocol
stack,
the
NIC
is
kind
of
at
the
bottom
and
on
one
side
to
the
outside
world,
connects
to
the
physical
media
that
could
be
fiber,
cat5
radio
and
we
use
some
sort
of
encoding
or
framing
over
that
media
Ethernet
Wi-Fi
fibre
channel
on
the
other
side
of
the
NIC.
It
connects
into
the
system
via
the
system
bus.
So
typically,
today,
this
is
PCIe
or
USB.
In
the
olden
days
like
this
card,
it
was
ISO.
H
So
the
way
this
works
is
that
Knicks
have
queues.
Typically,
they
have
a
transmit
Q
and
a
receive
Q,
and
these
queues
store
the
packets
or
indicate
the
package
for
transmit
and
receive
the
queues
are
composed
of
a
set
of
descriptors
and
the
descriptors
describe
the
packet
for
the
NIC.
Some
of
the
important
things
in
the
scripters
are
where
the
packet
is
located
in
hosts
memory.
What
the
length
of
the
packet
is
and
then
some
ancillary
information
that
may
have
involved,
for
instance,
if
it
was
received
as
broadcast
Ethernet
and
other
information
like
that.
H
So
in
order
to
transmit
the
host
stack,
fills
out
a
transmit
descriptor
and,
most
importantly,
it
writes
the
information
in
that
for
the
packet
where
the
packet
is
located
in
its
memory
and
what
the
length
of
the
packet
is.
It
puts
the
transmit
descriptor
onto
a
cue
and
I
should
mention
it's
producer-consumer
type
of
cue.
So
it
puts
the
transmit
descriptor
on
the
cue
pumps,
the
producer
pointer,
and
then
it
sends
an
indication
to
the
NIC,
usually
through
a
pci
write
register
right
that
there's
work
to
be
done,
so
the
NIC
wakes
up.
H
It
processes
the
transmit
cue,
and
it
looks
at
each
of
the
transmitted
descriptors
figures
out
where
the
packet
is
in
hosts
memory,
performs
a
DMA
operation,
direct
memory,
access
to
pull
the
packet
into
its
local
memory,
and
then
the
NIC
may
perform
some
offload
processing
which
we'll
talk
about
in
a
bit,
but
eventually
the
packet
has
to
be
sent
on
the
network.
So
there
is
a
by
an
assertive
serializer
inside
the
device
that
takes
the
packet
in
its
memory.
H
Serializes
the
data
and
sends
it
out
to
the
actual
network
receive
is
somewhat
similar
in
the
receive
path.
The
host
sets
up
a
number
of
packet
buffers
where
packets
will
be
stored
in
its
memory,
and
it
puts
these
into
the
receive
queue
in
the
receive
descriptors.
So
again
in
each
descriptor,
there's
a
memory
location
and,
in
this
case,
maximum
length
of
the
packet.
H
H
So
what
I
just
described
is
kind
of
fundamental
and
that's
the
basics
of
the
neck
and
be
started
in
approximately
the
early
90s,
not
soon
after
some
of
the
basic
off
loads
that
I'll
talk
about
in
a
minute
came
into
being
and
we're
developed,
and
we
can
track
the
evolution
of
NICs
since
then.
So
in
the
mid-2000s
we
have
data
plane
accelerations.
So
these
are
more
advanced
features
inside
the
Nix
IPSec
offload,
for
instance,
QoS
off
loads
and
more
recently,
there's
a
general
movement
to
make
these
devices
programmable.
H
H
So
we'll
talk
a
lot
about
offloads
today,
I
want
to
give
a
little
bit
of
motivation.
One
way
you
can
think
of
offloads
is
these
are
just
advanced
features
having
to
do
with
the
packet
processing
or
protocol
processing.
That
happens
to
be
done
in
the
neck.
So
there's
a
few
rationales
for
this
one
is
we
want
to
free
up
the
host
CPU
cycles
for
application
work?
This
makes
sense
if
the
NIC
can
do
the
functions
of
networking
in
a
more
efficient
way.
H
So
since
its
specialized
hardware,
that
is
often
the
case,
for
instance,
we
can
compute
a
checksum
more
efficiently
than
going
in
the
host
CPU.
More
generally,
one
of
the
motivations
is
to
save
hosts
resources,
so
offloads
may
save
not
two
CPU
memory,
DMA
operations,
memory
movement,
a
number
of
interrupts
scaling
performance-
is
very
important
and
offloads
helped
a
lot
there,
particularly
in
low
latency
and
high
throughput.
H
There's
also
some
interesting
use
cases,
particularly
in
mobile,
where
we
might
offload
certain
operations
having
to
do
with
protocol
processing
to
a
device
for
the
purposes
of
saving
CPU
cycles
and
saving
power,
in
particular
on
the
core
CPU.
So
in
short,
offloads
makes
sense
as
a
cost-benefit
trade
off.
If
the
benefits
of
moving
work
into
the
neck,
you
can
think
about
its
coat
process
or
exceed
the
cost,
then
it
makes
sense
in
practice.
This
can
be
interesting
analysis
and
we
know
that
CPUs,
for
instance,
are
always
increasing
their
capabilities.
H
On
the
other
hand,
the
network
and
things
we
want
to
do
are
always
getting
more
complex,
so
there's
always
a
bit
of
a
trade-off
between
whether
to
offload
or
run
on
the
host
CPU,
but
in
general,
we
found
offloads
to
be
pretty
useful.
On
probably
will
continue
that
trend
in
terms
of
developing
off
loads
and
nic
development
in
general,
in
the
Linux
community
at
least,
we
kind
of
enshrined
some
of
the
principles
in
something
called
less
is
more
and
I
want
to
give
three
components
of
this.
H
So,
first
of
protocol,
agnostic
mechanisms
are
better
than
protocol
specific,
and
this
is
somewhat
of
a
formulism
of
trying
to
prevent
proto
classification,
but
the
idea
is
if
we
can
develop
an
offload
that
supports,
say
all
transport
protocols
equally
versus
one
that
is
only
only
works
with
TCP
or
plain
TCP
IP
packets
generally,
the
offload
that
is
more
general.
It's
going
to
be
more
applicable
and
better
for
the
user.
In
a
similar
vein,
common
api's
are
better
than
proprietary
a
peons.
H
We
have
a
lot
of
OSS
a
lot
of
NICs,
the
more
common
the
API
is
across
those
the
easier
it
is
for
users
to
choose
different
pieces
of
hardware.
This
is
particularly
important
in
that
we
want
to
avoid
the
concept
of
vendor
lock-in,
which
is
where
a
vendor,
whether
purposely
or
inadvertently,
kind
of
controls,
the
API
such
that
it's
really
difficult
for
the
user
to
change
vendor
the
vendors
that
they're
using.
H
The
third
point
is
the
program
program.
Ability
is
good,
so
I
put
this
in
generally.
In
parentheses,
one
of
the
aspects
of
program
ability
is,
if
we
make
it
completely
openly
programmable,
especially
user
programmable,
and
allow
users
to
do
whatever
they
want.
Users
will
do
whatever
they
want
that,
as
we
know,
leads
to
some
interesting
fracturing
of
the
market
and
can
be
precarious.
So
we
always
want
to
make
sure
that,
if
we're
going
to
create
a
open
program
environment,
how
do
we
develop
the
ecosystem
properly
and
maintain
some
semblance
of
sanity
across
these
and
portability?
H
So
we
can
turn
and
look
at
some
of
the
basic
offloads
I'm
gonna
skip
that
slide.
So
we'll
talk
about
three
basic
offloads
and
these
are
kind
of
the
oldest
ones,
they're
very
common
amongst
Nicks.
Most
of
these
have
been
around
since
the
90s
at
least
checksum
offload,
segmentation,
offload
and
multi
cure
check.
Some
offload
is
the
offload
of
the
venerable
TCP
UDP
transport
checksum.
So
the
idea
is
that
we
want
to
offload
the
computation
of
the
checksum,
so
the
ones
complement
summation
in
particular
is
CPU
intensive.
H
If
we
offload
that
to
the
NIC,
we
get
a
nice
performance
gain,
as
I
mentioned,
checksum
offload
is
particularly
ubiquitous.
It
would
probably
be
pretty
hard
to
find
a
NIC
and
on
the
market
today
that
does
not
sort
support
some
form
of
this
an
interesting
twist.
That's
a
little
bit.
Recent
is
encapsulation.
H
So
what
we
found
is
that
say,
IP,
nukkie,
encapsulation
or
particular
udp-based
encapsulations
actually
can
have
multiple
transport
protocols
per
packet
that
contain
their
own
checksum.
So
conceptually
it's
possible
to
have
two
three
four
five
or
six
check
sums
in
a
single
packet,
TCP
check
sum
and
UDP
checksum
a
GRE
checksum,
it's
all
possible.
So
we
want
to
offload
all
of
those
check
sums
and
we
found
some
techniques
that
can
leverage
rudimentary
checksum
offload
of
one
checksum
to
actually
support
multiple
check,
something
even
in
the
same
packet.
H
So
a
little
bit
of
detail
so
transmit
checksum.
Also
offload
has
two
forms:
one
is
protocol.
Specific
one
is
protocol
agnostic,
the
protocol
specific
one.
We
the
host
sends
a
packet
into
the
device,
the
device
actually
parses
a
packet
determines,
if
there's
a
transport
header
and
the
checksum,
and
if
there
is
it,
does
all
the
operations
to
set
the
checksum
so
to
perform
the
ones
complement
checksum
over
the
data.
It
will
compute
the
suit
pseudo
header
checksum,
if
there's
one
there,
and
it
will
set
the
checksum
in
the
appropriate
field
of
the
transport
layer.
H
The
more
generic
method
is
for
the
host
to
indicate
in
instructions
exactly
how
to
do
the
checksum,
so
it
provides
two
pieces
of
information
to
the
device.
One
is
where
the
checksum
arts
starting
offset
in
the
packet
and
the
other
one,
is
the
offset
to
write
to
checksum,
which
would
typically
be
the
checksum
field
of
TCP,
for
instance,
and
then
the
start
would
be
the
offset
of
the
TCP
header.
H
The
device
gets
this
and
it
will
perform
the
ones
complement
some
starting
from
the
starting
point
to
the
end
of
the
packet
and
that
some
whatever
it
gets.
It
basically
adds
it
in
to
the
existing
value
and
the
checksum
field
and
checks
then
sets
the
field.
As
long
as
the
host
set
this
up
and
initialize
a
checksum
field
correctly,
the
device
will
set.
That's
correct.
Checksum
has
no
idea
what
kind
of
checked
something
it
is
doesn't
know
if
it's
UDP
or
TCP
it
doesn't
care.
It
just
knows
it's
the
standard
internet
package
checksum
for
receive.
H
We
have
an
analogous
situation.
There
is
a
protocol,
generic
and
a
protocol
specific
method.
The
protocol
specific
method
is
called
checksum
unnecessary,
as
packets
are
received,
the
NIC
parses
the
packet
determines
if
there
is
a
transport
protocol
that
contains
a
checksum
and
performs
a
work
to
actually
verify
the
checksum,
so
it
doesn't
one's
complement,
checksum
computes,
the
pseudo
header
adds
them
checks.
If
the
result
is
checksum
zero,
if
it
is,
the
checksum
has
been
verified
and
sets
a
bit
in
the
receive
descriptor
to
inform
the
host
that
it's
verified,
the
checksum.
H
So
again
that
is
protocol
specific.
It
only
really
works
with
TCP
and
UDP
packets
that
the
device
explicitly
parses
the
more
generic
method
is
checksum
complete.
In
this
case,
the
device
performs
a
one's
complement,
some
of
the
whole
packet
starting
from
the
IP
header
through
the
end
of
the
packet,
and
it
simply
returns
that
some
in
the
receive
descriptor
to
the
host
the
host
can
take
that
and
actually
use
it
through
simple
manipulations
of
checksum
to
verify
any
number
of
check
sums
in
the
packet.
H
So
looking
at
segmentation
offload,
one
of
the
observations
that
we've
made
is
that
networking
stacks
are
more
efficient
when
they
process
large
packets,
as
opposed
to
small
packets,
so
in
particular
per
packet
processing
per
packet
overhead
in
the
stack
is
significant.
More
than
processing
the
data
bytes.
Usually
so
we
want
to
see
if
we
can
arrange
the
system,
so
we
can
cross
this
large
packets
instead
of
small
packets.
H
So
there's
two
forms
of
this
is
one
on
transmit
and
one
receive
on
transmit
segmentation
offload.
The
idea
is,
the
host
produces
a
large
packet,
say,
a
64k
TCP
segment,
and
we
want
to
break
this
packet
up
into
smaller
chunks
for
sending
out
into
the
network,
which
may
have
say
1500,
byte
em
to
you.
So
we
want
to
do
this
as
low
as
possible.
So
the
idea
is
the
stack
process.
H
Is
the
big
packet
processes,
one
IP
header,
one
TCP
header
and
at
the
lowest
point
possible
either
in
the
software
or
even
in
the
network
device?
There's
a
type
of
segmentation
or
fragmentation,
so
we
slice
up
the
data,
give
each
packet
its
own
IP
header
on
TCP
header
and
send
each
one.
So
there
is
a
software
variant.
Hardware
variant
of
this
software
variant
is
called
GSO
generic
segmentation
offload.
The
hardware
variant
is
lsah
large
segmentation
offload.
H
You
might
see
it
also
called
TSO
TCP
segmentation
offload
with
this,
when
this
is
specific
to
TCP
received
segmentation
offload
is
the
opposite.
So,
when
small
packets
are
received,
we
try
to
coalesce
these
into
larger
segments
and
larger
packets.
So
again,
this
is
per
flow,
similar
operation
and
there
are
two
variants
of
this
one
of
the
software.
One
is
the
hardware
the
software
is
generic
receive
off
the
Jaro.
The
hardware
is
LR
a
larger
sieve.
Offload
this
particular
are
flowed
of
all
the
checks
or
all
the
basic
offload
is
probably
the
hardest
one.
H
It
does
require
the
network
device
to
be
able
to
parse
the
packet
and
understand
a
lot
of
details
of
the
protocol,
so,
for
instance,
the
implementation
that
do
this
really
only
understand
TCP,
usually
some
of
that
encapsulation,
but
until
we
have
say
a
fully
programmable
environment,
it
is
hard
to
generalize
this
one.
One
thing
I'd
like
to
mention
about
segmentation
offload:
this
really
only
works
in
conjunction
with
checksum
offload.
H
H
One
of
the
interesting
properties
is
that
once
we
have
kids,
we
can
assign
properties
to
them,
particularly
on
transmit
each
queue
can
have
its
own
attribute.
So,
for
instance,
we
kind
of
high
priority
kids
and
low
priority
kids,
one
of
the
important
aspects.
When
we
deal
with
multi
queue.
We
do
want
to
try
to
keep
packets
in
order.
So,
for
instance,
we
don't
want
to
be
distributing
packets
in
the
same
flow
across
different
queues
either
and
transliteracy.
H
So
there
are
some
techniques
in
the
model
of
queuing
to
try
to
enable
that
in
order
delivery
as
much
as
possible,
so
on
transmit.
There
are
essentially
two
methods
to
do
this.
One
is
the
easy
method
which
is
fundamentally
each
CPU
is
assigned
to
a
queue.
So
when
an
application
is
sending
a
packet,
for
instance,
the
Hieu
chosen
is
the
one
associated
with
that
cpu
the
applications
running
on,
and
the
advantage
of
this
is
that
we
get
this
sort
of
siloing
locality.
H
For
instance,
when
a
packet
is
sent
on
a
queue,
we
have
to
lock
the
queue
in
order
to
manipulate
the
queue
pointer.
If
we
do
this
in
cpu
/
queue,
then
there's
no
contention
for
the
lock
and
no
contention
for
the
structures
of
the
queue.
These
second
method
is
when
the
driver
selects
the
queue.
So
as
I
mentioned,
queues
can
have
some
rich
semantics,
such
as
priority.
H
What
we've
done
there,
instead
of
trying
to
expose
all
possible
combinations
of
this,
we
allow
the
driver
to
basically
understand
the
queue
layout
of
the
topology,
what
the
different
queues
are
and
when
the
host
Jack
wants
to
send.
It
basically
asked
the
driver
that
has
intimate
detail
of
the
device.
What's
the
best
queue
to
send
this
on,
the
driver
can
do
that.
So,
for
instance,
if
we're
sending
a
high
priority
packet,
where
the
metadata
associated
with
packet
said
this
high
priority.
H
When
this
goes
into
the
driver,
it
looks
up
the
queue
that's
appropriate
for
that,
so
they
may
have
a
CPU
to
queue,
affinity,
priority,
there's
also
other
attributes.
You
could
apply
like
rate
limit
on
the
receive
side.
This
is
normally
called
packet
steering.
So
the
idea
is
when
packets
come
in
to
the
NIC,
they
need
to
be
distributed
amongst
the
queues.
It
turns
out.
This
is
a
lot
like
a
CMP,
and
some
of
the
techniques
are
very
similar
where
we're
trying
to
distribute
in
a
CMP
to
multiple
interfaces
on
the
state.
Listen,
a
stateless
ID.
H
There
are
two
forms
of
this:
one
is
called
received:
packets,
jaring,
that's
a
software
variant.
Rss
we
see
site
scaling
is
a
hardware
variant.
They
both
essentially
work.
This
the
same
when
packets
come
in
a
hash
is
performed
over.
The
five
tuple
of
the
packet
of
the
transport
layer
is
available
or
three
Chappel
if
we're
using
the
flow
label,
but
the
effect
is
to
identify
the
flow
by
a
hash.
H
Take
that
hash
and
map
that
into
one
of
the
queues
and
that
ways
we're
also
consistent
so
for
this
particular
flow,
it
always
has
the
same
hash.
Therefore,
we
can
always
map
that
to
the
same
queue
in
order
to
facilitate
in
order
delivery.
An
extension
of
this
is
something
called
receive
flow,
steering
in
this
case
the
host
itself
can
actually
sort
of
program
for
each
flow
which
queue
to
use.
This
is
very
powerful
mechanism,
so
on
a
per
flow
basis.
The
host
can
indicate
okay
for
this
flow
use.
H
This
queue
there
are
two
variants
of
this.
Also
there
is
a
software
variant
and
hardware
variant.
The
advantage
of
this
is
to
get
a
really
good
isolation.
Some
people
use
this
where
they
pin
an
application
to
a
CPU
where
that
application
only
runs
on
that
CPU
and
they
associate
a
network
queue
with
that
application
and
receive
flow.
Steering
can
actually
arrange
it
so
that
packets
only
for
that
applications
flows
go
to
that
queue.
So
it's
very
siloed.
The
application
acts
like
it's
the
the
only
application
on
the
system.
H
I
Thanks
Dom
so
so
far
tom
is
taking
us
to
some
basic
uploads
and
the
basic
functionality
of
the
Nick
itself.
Well,
as
the
use
cases,
the
demands
of
the
users
evolve
and
the
hardware
evolves.
At
the
same
time,
it
only
makes
sense
that
more
and
more
processing
could
be
pushed
down
to
the
hardware,
and
so
in
this
section,
we'll
look
at
at
examples
of
that
in
terms
of
uploading,
more
of
the
data
plane
or
of
the
back
of
processing,
but
before
I
get
into
some
examples
in
that
area.
I
I
just
like
to
quickly
cover
some
of
the
hardware
solutions
that
might
be
used
in
in
this
kind
of
area.
So
it's
important
to
note
that
these
solutions
are
it's
a
little
bit
of
a
mix
and
match.
It
depends
very
much
on
the
use
case
which
choice
is
appropriate
and
some
hardware
choices
match
some
use
cases
more
chillie
than
others,
but
at
the
same
time
they're
not
necessarily
mutually
exclusive.
So
so
far,
the
next
we've
talked
about
a
fall
in
the
first
category,
where
you
have
a
fixed
data
plane.
I
So
this
will
become
kind
of
a
sick
that
implements
a
pipeline
in
hardware
and
we
can
also
use
more
programmable
technologies
and
this
kind
of
fall
into
three
sub
categories.
We
have
semi
specialized
processes,
called
network
flow
process
or
NP
or
network
processing
unit,
and
so
in
this
it's
a
little
bit
similar
to
a
general
process,
a
purpose
processor
like
a
CPU
on
a
in
in
a
server.
I
I
And
then
you
have
the
FPGA,
which
is
probably
the
most
programmable
solution
possible.
Here
we
have
gate
level
programming.
So
essentially
you
can
program.
The
hardware
itself
find
it,
and
so
you
can
describe
at
the
gate
level
what
the
pipeline
should
be,
and
then
we
have
general-purpose
processes,
so
this
would
be
putting
say
an
ARM
processor
onto
the
neck
to
execute
the
pipeline
and-
and
you
will
get
back
to
to
this-
the
programmability
aspects
a
little
later
in
the
presentation
so
back
to
data
plane
acceleration.
I
Here
we
have
a
diagram
that
represents
roughly
how
this
works,
so
we
have
applications
and
then
in
the
corner
we
have
a
implementation
of
a
data
path
and
then
down
in
the
in
the
offload
Nick.
We
have
a
data
plane
which
implements
some
more
or
maybe
all,
of
the
functionality
of
the
data
path
in
the
in
the
kernel,
and
so
this
is
able
to
afford,
for,
for
example,
for
packets
around
and
so
on.
I
I
I
So
the
first
step
is
that
we
do
some
kind
of
header
extraction,
so
we
pull
out
some
fields,
for
example
the
five
Chuco,
but
those
we
also
have
metadata,
for
example,
the
port
that
the
packet
arrived
in.
Other
things
can
also
be
available.
Then,
using
this
data
we
typically
do
a
hash,
and
then
there
has
looking
up
in
the
hash
table.
We
try
to
find
a
match,
and
if
we
do
find
a
match,
then
the
match
will
supply
some
kind
of
action.
There
should
be
executed
or
a
list
of
actions.
I
I
I
You
can
also
extract
the
source
and
destination
IP
addresses
from
l3,
and
then
you
can
also
select,
for
example,
the
ports
that
layer
for
date.
So
you
can
create
a
specific
role,
to
example,
do
some
kind
of
special
treatment,
port
80
traffic,
possibly
to
a
separate
host,
it's
fairly
flexible
in
this
regard,.
I
Oftentimes
this
is
set
up
in
such
a
way
that
if
the
offload
data
plane
can't
process
a
particular
packet
for
some
reason-
or
perhaps
it's
for
a
protocol-
that
it
can't
understand,
perhaps
it's
table
capacity
has
been
exceeded.
Anyone
a
variety
of
reasons
we
may
have.
You
may
have
a
mechanism
in
place
that
allows
the
processing
of
the
packet
to
be
pushed
back
to
the
to
the
host
and
the
host
marry
after
that
in
in
various
ways.
I
The
interesting
thing
about
this
use
case
is
that
there's
no
queue
available,
so
the
actions
that
can
be
applied
are
fairly
limited.
We
can
release
the
packet,
perhaps
by
dropping
it
or
marking
it,
we
can
filter
it,
and
so
us
is
a
little
bit
more
interesting
or
a
little
bit
more
complex,
perhaps
is
a
better
way
to
put
it
because
we
have
a
queue.
So
we
have
the
option
of
doing
a
much
larger
number
of
different
things
with
the
packets.
I
In
order
to,
for
example,
enforcer
decide
packet
rate,
we
can
delay
packets,
we
go
of
course,
drop
them
and
so
on,
and
this
is
an
area
of
which
is
received
significant
research
over
the
years
and
also
this
research
is
applicable.
There,
of
course,
are
challenges
in
implementing
individual
algorithms
on
an
offload
NIC,
as
opposed
to
a
holster.
It's
usually
a
more
limited
execution
environment,
but
nonetheless
the
same
principles
generally
apply.
Now
in
this
diagram,
we
have
packets.
I
Coming
into
the
machine
into
the
neck
and
also
exiting
the
neck,
so
they're
being
folded
from
one
court
to
another
of
the
neck,
that
could
be
a
virtual
port
or
a
physical
port
and
the
NIC
is
applying
some
kind
of
QoS
as
they
traverse
the
neck
in
the
next
slide.
In
this
slide,
we
have
a
slightly
different
setup,
so
here
we
have
packets.
Of
course,
it's
two
directional,
but
I
just
will
only
talk
about
one
direction,
which
is
Mac.
I
J
I
In
this
particular
case,
we
have
different
applications
and
by
some
kind
of
selection
mechanism
they
are
allocated
to
different
queues
and
each
queue
has
a
different
read
instance
running
on
it,
and
this
could
have
could
mark
the
packets
or
drop
the
packets
if
they're
exceeding
a
certain
rate
and
and
so
on,
and
if
this,
of
course
is
not
limited
to
read,
I
just
use.
This
particular
example.
I
I
I
Essentially,
what
the
host
will
do
is
the
format
a
TLS
frame,
but
it
does
not
perform
the
cryptographic
operation,
so
the
authorization
hash
this
space
for
it,
but
it's
not
filled
in
and
the
packet
or
the
the
record
payload
is
in
plain
text
and
then
the
offload
Nick
will
receive
this
record
and
perform
the
cryptographic
operations.
So
it
turns
the
plaintext
into
some
text
and
it
tends
to
fills
in
hash
on
rx.
Things
are
reversed.
I
I
I
Ipsec
acceleration
flow
follows
a
similar
principle
to
the
TLS,
in
the
sense
that
some
parts
are
floating
and
some
parts
are
not,
and
at
this
time
we
have
two
models
for
this
one.
It's
the
crypto
offer
load,
which
is
very
similar
to
what
I
described
to
two
TLS,
in
the
sense
that
it
is
the
hosts
responsibility
to
add
the
IPSec
headers
to
the
packet,
but
it
does
not
perform
the
cryptographic
operations
which
are
left
of
the
card.
I
It's
worth
noticing
at
this
point
that,
on
the
one
hand,
in
did
this
combines
a
number
of
different
offloads
which,
which
we've
already
discussed
the
LSO.
The
segmentation
offload
and
I
check
some
offload.
So
one
if
one
is
offloading
the
crypto
one
also
needs
to
upload
those
operations.
They,
conversely,
with
IPSec
traffic,
one
cannot
offload
the
segmentation
offload
or
the
checksum
offload
if
one
does
not
also
offload
the
cryptographic,
so
there's
significant
benefits
to
being
able
to
build
this
stack,
but
in
a
sense
it's
an
evolution.
I
I
I
I
There
is
got
to
offload
this,
but
the
way
that
these
things
tend
to
evolve
visible.
That
start
with
something
simple
that
has
a
very
large
benefits,
krypter
offload,
and
then
we
went
to
a
fuller
offload
and
potentially
IKEA.
He
could
also
be
offloaded
at
some
point
in
the
future,
so
with
that
I'd
like
to
hand
over
to
Andy
who
now
addressed
further
evolutions
in
Nick
technology.
Thank
you.
K
Alright,
so
you've
already
heard
a
pretty
long
discussion
about
how
this,
how
these
things
all
works.
That's
good,
I
appreciate
everyone,
who's
still
awake
and
it's
finished
checking
all
their
email.
No,
no
talk
about
programmability,
so
you
know
what
Simon
and
Tom
talk
about.
Really
all
these
offload
features
that
were
enabled
exclusively
by
hardware
providers
or
hardware
vendors
who
feel
like
this
is
something
useful,
probably
from
feedback
based
on
users.
Maybe
not
it
sort
of
depends.
K
We're
gonna,
we're
gonna,
build
on
that
and
talk
about
how
it's
sort
of
the
next
evolution
in
this
in
this
path
is
fully
programmable
mix.
So,
as
time
talked
about,
those
are
could
be
good,
probably
good,
but
really
there's
a
couple.
Key
features
I
want
to
highlight
and
think
about,
and
why
programmability
of
a
NIC
would
matter
so
right
out
of
the
gate.
K
I
think
one
of
the
really
important
things
is
that
it
facilitates
really
a
rapid
protocol
development,
so
we're
kind
of
in
a
phase
right
now
where
fixed-function
offload
is
so
powerful
and
so
useful
that
if
you,
if
you
want
to
deploy
a
new
protocol-
or
you
think
you
want
to
help
develop
a
new
protocol
and
you
want
to
rapidly
iterate
that
one
of
the
problems
you
find
getting
yourself
into
is
that
well,
are
we
gonna
really?
Our
current
infrastructure
are
really
gonna
burn,
more
cores
processing
packets,
just
to
support
this
new
protocol.
K
K
That
would
be
something
that
if
you
had
a
programmable
Nick
and
you
knew
what
the
problem
was,
because
probably
you
wrote
it-
you
could
go
fix
it
so,
additionally,
rolling
out
new
security
fixes,
always
a
great
idea.
There's
this
notion
right
that
that,
if
you,
if
you
run
a
large
or
small
scale
data
center,
there's
there
is
going
to
be
some
magic
packet.
That's
going
to
melt
your
network,
and
this
would
give
you
that
opportunity
to
snuff
that
out
and
hardware
before
it
gets
too
far.
K
So
so
today
in
the
programmable
Nick
world,
there's
really
two
to
sort
of
main
types.
One
is
special
purpose
hardware
or
FPGA
and
P
use
that
Simon
is
referenced
before
so.
This
is
something
that
we're
going
to
program
very
specific
hardware,
we're
going
to
write
code
for
and
then
the
other
one
is
really
a
new
class
of
Nick's
that
have
appeared
in
the
last
couple
years.
That
really
just
contain
a
general-purpose
processor.
So
this
might
be
an
arm
and
x86
a
mips,
maybe
in
the
future
like
risk
5,
but
really
just
something-something
general-purpose.
K
They
can
run
any
code
so
and
and
I
think
really
well.
This
might
seem
today
like
something
that
isn't
exactly
what
what
you
might
want
sort
of
looked
at.
Some
of
the
forwarding
plane,
realities,
slides
from
the
last
IETF
I.
Think,
there's
a
really
interesting
quote
at
the
conclusion
at
the
end
of
that
that,
what's
what's
niche
today
can
be
broad
tomorrow
and
I.
K
Think
that's
generally
speaking,
what
we've
seen
across
the
board
in
networking
and
then
Nick's
that
there'll
be
someone
that'll
roll
out
a
new
feature
and
someone
will
think
I,
don't
know
before
long
everybody's
got
it
and
everybody
wants
it.
So
so
I
think
programmability
is
going
to
be
that
next,
that
next
piece
so
kind
of
build
on
the
common
language
that
we
had
for
our
pictures
earlier.
Hopefully,
this
language
resonates
with
people,
otherwise
that's
a
bummer
because
we
use
it
in
the
whole
deck.
K
So
it's
going
to
do
if
you're
running
a
routing,
daemon
or
something
else,
the
setting
up
flows,
all
that
still
runs
there,
but
now
we're
in
a
case
where
this
this
offload
data
plane
is
going
to
run
down
in
the
FPGA
or
the
NPU,
and
in
fact
one
do
unique
pieces
with.
This
is
we'll
be
cases
where
a
software
data
path
does
not
exist
in
the
kernel
for
whatever
feature
you're
adding.
K
Now,
that's
it's
a
little
bit
different
from
what
we
do
in
the
Linux
community,
where
there's
Hardware
offload
capabilities
that
are
there
and
your
hardware
there's
sort
of
an
insistence
that
there's
a
software
fallback
data
path
that
exists
and
within
Linux
that's
been
extremely
helpful
and
we're
going
to
continue
I
think
to
push
that.
But
this
is
a
case
where
that
might
not
be
the
case.
K
You
may
just
have
a
data
path,
that's
completely
done
in
the
kernel,
with
no
software
fallback
at
your
own
risk,
I,
guess
and
and
in
fact,
that
data
plane
could
be
expressed
in
a
variety
of
languages.
So
you
know
maybe
p4u
vpf
and
PL,
or
maybe
just
a
native
instruction
set
for
for
that
that
NP
you
as
Simon
talked
about
many
MP
use,
have
has
something
that
maybe
have
special
instructions
for
performing
operations,
and
the
key
that
we
talked
about
too,
is
that
this
is
this
is
dynamically
program.
K
So
in
this
you
know
this
is
def
packet
that
could
exist.
You
can
roll
out
new
code
quickly
or,
if
you're
rapidly,
developing
a
new
protocol-
and
you
start
to
say
you
know
what
maybe
I.
Don't
need
350
bytes
of
header
to
describe
this.
This
new
protocol
maybe
will
make
it
a
little
shorter,
like
324
or
something
who
knows
so.
They
ought
to
keep
the
other.
K
The
other
piece
is
a
really
a
general-purpose
processor,
and
so
this
is
a
little
bit
of
a
unique
situation,
a
little
different
than
we've
had
in
the
past,
but
it's
becoming
pretty
popular,
and
so
this
is
a
case
where
we're
moving
the
entire
host
networking
stack
down
on
to
the
NIC
so
and
yes,
I
said
that
right.
So
what
that
actually
means
is
your
NIC
could
actually
run
another
copy
of
an
operating
system.
K
Some
people
shudder
at
this
thought,
because
maybe
it
sounds
a
little
more
complex,
but
the
fact
is,
if
you
have
this
already
implemented
software
on
your
server,
you
could
actually
move
it
down
to
your
NIC
and
free
up
those
server
cores
from
doing
that
work.
So
in
this
case
the
data
plane
offload
is
down
on
this
general-purpose
processor,
as
I
mentioned,
and
also
the
control
plane.
So
now
what
if
your
routing
daemon
was
running
on
the
neck
or
what?
If
your
whatever
was
receiving?
K
You
know
open
flow
messages
from
a
controller
was
running
completely
on
the
NIC.
So
now
you've,
you
found
yourself
consuming
host
resources,
server
host,
not
Nick
host.
You
know
sort
of
different
CPU
complexes.
There
are
not
sort
of
actually
different
CPU
complexes,
so
now
you're
not
consuming
any
of
the
resources
of
your
server,
and
you
can
free
them
up
for
doing
useful
things
whatever
those
may
be.
So
this
control
plane
offload
is
also
really
nice.
K
If
you
have
what
some
are
calling
now,
a
bare-metal
deployment
where
you're
really
you're
setting
up
servers,
you
don't
know
exactly
what
they're
going
to
be
used
for,
but
you're
responsible
for
networking.
You
can
feel
pretty
confident
that
there's
a
good
chance
that
your
server
administrators
are
not
going
to
ruin
whatever
network
setup.
You
want
them
to
have
pretty
confident,
also
in
the
multi
tenant
deployments.
This
would
be
really
good.
K
You
can
can
make
sure
that
that
no
one,
no
one
person
has
a
chance
to
destroy
too
much
and
it
really
it
brings
a
lot
of
the
server
networking
administration
back
into
the
purview
of
the
network.
Admin
I
think
that's
a
sort
of
a
constant
struggle
between
those
two
groups
somewhat
understandably
so
this
this
gives
networking
networking
tentacles
to
get
a
little
bit
further
into
the
server.
If
you
will
so
kind
of
in
the
same
vein,
here's
that
picture
again.
K
So
this
this
would
mean
that,
obviously,
if
you
have
applications
that
are
running
in
your
server,
they're
still
going
to
get
the
data
that
they
need,
but
you're
not
spending
your
time
just
needlessly
moving
packets
between
between
different
applications,
whatever
those
look
like
and
and
the
reality
to
here-
and
it
doesn't
get
any
more
recursive
than
this
I
promise-
is
that
the
programmable
mix
also
have
offload
capable
devices.
These
things
are
all
being
put
on
the
same
die,
so
you
all
have
a
control
and
a
data
plane
and
a
fixed
function
device.
K
That's
all
embedded
down
but,
like
I
said,
I
promised
that
that
offloaded
data
path
on
the
fixed
function
device
doesn't
also
contain
another
general-purpose
processor
and
another
one
on
down.
It's
just
as
simple
I
appreciate
that
it's
just
just
the
simple.
The
simple
fact
is
we're
building
these
chips
that
are
pretty
large
and
have
both
the
general
purpose.
You
know-
maybe
you
know
maybe
armored
MIPS
cores
on
the
side
with
with
a
fixed
function
ASIC.
There
are
also
people
building
building
mix
that,
in
addition
to
that,
have
FPGA
is
RMP
use
as
well.
K
So
so
I
think
this
is
kind
of
a
new
world
in
a
lot
of
ways.
I
think
there's
not
a
lot
of
not
a
large
number
of
users
that
are
doing
this,
but
I
think
this
is
a
strong
case,
especially
in
a
place
like
this,
where
we're
seeing
rapid
protocol
development,
where
the
programmable
NIC
is
an
extremely
powerful
option
and
an
extremely
interesting
going
forward,
so
I
think
really
the
way
that
we
want
to
summarize
this
is
that
we
think
about
the
networking
trends
going
forward.
There's
an
insatiable
need
for
more
bandwidth
and
lower
latency.
K
I.
Think
the
devices
that
we
carry
around
in
our
pockets
every
day
that
that
help
us
consume
more
and
more
data,
and
not
only
in
the
air
but
where
the
actual
wires
are
in
the
data
centers
there's
just
people
want
want
more
all
the
time.
I'm
amazed
how
many
people
are
walking
around
doing
video
calls
or
driving
doing
video
calls
I
wish.
That
was
a
joke,
but
it's
not
and
I
wish
it
was
passengers,
but
anyway
and
I
think
there's
we're
seeing
more
and
more
to
that
there's
an
interest
in
deploying
new
protocols.
K
We
regularly
hear
requests
for
things
that
you
know.
We
wonder
how
we
can
make
the
hardware
that
is
fixed-function
support,
how
long
it
will
take
to
maybe
support
that.
So
this
this
gives
a
new
option
for
people
who
want
to
want
to
do
this
things
quickly
and
I
think
that
that
the
Knicks
are
going
to
work
together
with
post
operating
systems.
To
make
these
things
happen,
we
don't
see
offloads
going
away.
We
see
offload
it's
becoming
more
powerful
and
becoming
more
flexible
and
and
continue
to
be
important.
K
So
and
I
also
think
that
the
programmability
and
this
flexibility
will
really
spur
innovation
that
that
we
haven't
thought
of
before
I.
Think
that's
the
magical
part
about
about
some
of
these
devices
that
that
are
completely
or
not
completely
bare
the
flexible
is
that,
is
you
get
the
chance
to
do
something
that
you
would
have
never
thought
possible
a
few
years
ago
and
and
who
knows
exactly
what
will
come
next
so
I
think
that's
to
me
really
exciting
I
think
that's!
It.
D
G
F
D
K
F
F
H
So
it
answered
the
question:
I
would
point
out
that
some
of
the
earlier
work
actually
came
out
of
windows.
For
instance,
RSS
was
literally
in
Bennet
I
think
it
was
endeth
described
that
and
I
believe
they
had
the
early
checksum
offload
and
I
think
what
happened
is
as
Linux
became
kind
of
more
popular
and
open-source.
We
had
a
lot
of
developers
that
are
working
on
that
and
at
some
point,
the
knick
vendors
as
the
volumes
go
up,
they
start
to
pay
attention.
That
being
said,
we
do
know
that
FreeBSD
may
use.
H
Some
of
these
I
know
that
some
of
the
work
that
we
did
and
package
during
was
being
applied,
and
that's
a
good
thing.
So,
like
I
said
in
my
talk,
we
do
want
common
api
is
across
os's
but,
most
importantly,
there
is
nothing
I,
don't
think,
there's
anything
we're
doing
in
the
Nick.
That
would
be
specific
to
Linux
or
any
particular
OS.
In
fact,
I
think
some
of
these
techniques
would
even
be
applied
in
something
like
DP,
DK
or
kernel
bypass,
so
again
we're
just
using
Linux
of
the
reference.
You
know.
H
L
M
You've
talked
a
lot
about
different
features
on
different
cards
in
the
list
of
it.
When
you're
writing
the
code,
you've
got
to
know
what
the
car
can
do.
That's
on
the
machine.
The
your
code
happens
to
be
running
on
so
I.
Don't
I,
don't
want
you
to
sort
of
explain
all
about
you
guys
now,
but
it's
really
the
questions
more
about
what
up
from
what
I've
seen
is
going
on.
Essentially
it's
someone
writes
a
page
to
say
you
know.
What's
the
consensus
on
what
people
do
say
for
offload
and
does
that
need
standardization?
M
K
That's
been
negotiated
so
to
speak
fairly.
Well,
it
is,
it
does
feel
a
little
bit
ad-hoc,
though
especially
I,
think
from
the
outside,
because
what
would
probably
burn
outsides
the
wrong
word
to
use
it
as
an
observer.
It
probably
might
feel
ad-hoc,
because
you
just
see
patches,
show
up
and
support
exists
and
usually
what
happens
is
one
vendor
will
come
up
with
it?
Another
will
say
yeah
me
too,
and
then
they'll
they'll
do
it
and
maybe
enhance
it.
K
M
M
A
J
H
So
if
you
think
about
let's
look
at
large
segmentation
offload
so
in
the
NIC,
this
is
splitting
a
packet
up
into
individual
TCP
segments.
Each
TCP
header
has
its
own
checksum,
so
I
need
to
actually
after
I
do
the
segmentation.
Then
I
need
to
set
the
checksum.
It
has
to
be
perfect,
and
this
is
actually
one
of
the
trickier
things
with
something
like
segmentation
offload,
the
fewer
things
I
have
to
do
per
packet,
the
better.
If
it's
the
case
where
I
could
just
copy
all
of
the
headers
to
each
segment.
H
That's
a
lot
easier,
but
each
time
we
have
to
consider
like
IP
IDs.
Another
good
example
in
the
IP
header,
but
packet
lengths
are
always
interesting
and
check
sums
the
hardest
one.
So
anytime,
I
have
to
set
something
that
is
unique
for
that
packet.
I
have
to
do
that
in
the
nick
and
checksum
offload
is
definitely
one
of
those.
J
Packet
and
for
receive
you
have
to
do
it
because
you
have
to
check
the
individual
check
sums.
Otherwise,
you,
you
might
end
up
returning
a
corrupt,
bigger
packet
to
the
stack
and
in
terms
of
capabilities
you
know
for
receive.
You
also
need
checks
on
the
floater
to
receive
upload,
yeah,
okay,
yeah,
the
other
question
I
had
was
do.
Is
there
somewhere,
I
mean
it
so?
The
earlier
questions
that
have
essentially
said
there's
a
cabal
of
of
you-
know:
10
people
in
the
world
who
actually
know
how
to
do
this.
J
H
G
H
So
the
question
was
about
path,
MTU
and
I.
Suppose
segmentation
offload,
so
it
doesn't
matter
and
in
fact,
when
we're
doing
something
like
LSO
TSO,
we
aren't
just
chucking
up
packets
per
them
to
you.
We
want
to
abide
by
the
path
m
to
you,
so
the
way
it
works
is
the
host
jack
actually
tells
what
the
size
is
of
the
packets
to
go
out.
So
we
can
abide
by
path
into
you.
H
One
of
the
interesting
things
that
we
try
to
do
is
when
we're
sending
LS
o
try
to
keep
the
packets
the
same
size,
except
for
the
last
one.
That
simplifies
the
problem
that
we
just
talked
about
with
lorenzo,
where
we
have
to
set
the
length
for
each
packet
easiest
way
to
do.
That
is
to
kind
of
infer
what
the
lengths
are.
So
we
tell
the
NIC.
H
This
is
the
the
length
the
maximum
length
make
all
the
packets
the
same
size,
except
for
the
last
one,
which
could
be
short
and
then
that
way,
we
can
accommodate
path
m
to
you.
So,
in
terms
of
larger
m
to
use
in
the
data
center,
we're
seeing
like
9
km
to
use
with
jumbo
frames,
that's
actually
a
little
less
pertinent
to
l,
ro
and
Ella
and
LSO,
in
that
case,
we're
actually
just
using
the
native
MTU
to
accomplish
the
larger
packet
size.
So,
in
some
circumstances,
that's
a
little
less
important.
N
Raw
chicken
on
flaps
I
was
wondering
about
the
crypto
offloading
that
was
sort
of
in
the
middle
of
the
presentation.
That
sounds
very
interesting,
but
on
what
I'm
wondering
about
is
to
what
extent
does
that
sort
of
repeat
the
risks
of
all
of
these
vulnerabilities,
such
as
the
padding
Oracle's
and
all
of
that
and
repeat
that
in
the
NIC
implementations?
Is
that
is
there
any
information
about
that?
These
are
their
experiences
for
that
all
of
the
stuff
that
got
solved
in
crypto
stacks
that
are
just
on
normal
CPU.
N
The
question
is:
I
mean
there
are
all
these
vulnerabilities
if
you
do
crypto
implementation,
a
climbing
attacks,
things
like
writing,
Oracle's
specific
to
symmetric
implementations.
To
what
extent
are
these?
What
is
the
risk
of
these
get
repeated
in
the
NIC
implementations
and
if
they
are
a
your
NIC,
how
do
you
fix
that.
I
Yeah
so,
though,
as
I
understand,
the
question
is
that
if
we
look
at
crypto
there's
a
wide
variety
of
attack,
vectors
varying
complexity
and
any
individual
implementation
might
be
suffering
from
any
number
of
these.
So
if
we
push
adding
crypto
implementation
down
to
the
hardware,
what
is
what
kind
of
problems
might
we
see
there
in
this
area?
Yeah,
so
I
think
that
that's
a
good
point,
and
certainly
we
can't
pretend
that
there
is
not
going
to
be
any
problems.
I
I
think
that
as
the
complexity
of
what
you're
as
sorry
as
the
complexity
of
what's
being
offered,
it
increases.
So,
for
example,
if
we
use
move
from
a
crypto
on
the
offload
towards
a
full
offload,
then
the
surface
for
these
kind
of
problems
must
surely
exist.
In
my
mind,
I'm
not
really
sure
what
the
the
best
way
to
move
forward
on
this,
certainly
that
the
vendors
or
the
surprise
of
the
code
or
ideally
open
code
we'd
need
to
move
rapidly.
But
perhaps
we
also
need
to
have
some
kind
of
communications
in
the
system.
I
I
I
G
G
L
What
what
we
see
exactly,
what
we
see
is
that
there
is
a
trend
towards
moving
protocol
implementations
to
the
application
space
for
bios
reasons,
and
we
see
that
with
quake.
In
particular.
What
I
have
seen
in
your
presentation
is
that
the
interfaces
that
are
shown
are
food
camera
at
the
system
level.
He
still
walk
to
enable
api's
that
can
be
used
directly
from
the
applications
to
use
the
offload
functions.
I.
K
Think
within
the
Linux
kernel
there
is
a
little
bit
of
that.
There
is
actually
there
was
a
presentation
done
last
year
in
Prague
the
net
dev
conference
about
actually
offloading
quick
and
and
what
could
be
done.
What
kind
of
kernel
interfaces
are
needed
in
order
to
to
make
that
possible?
So
I
think
that
I
think
the
move
to
protocol
implementations
like
that
in
user
space
is
may
be
a
result
of
hardware
and
flexibility.