►
From YouTube: Sudarshan Ramachandran -- When 10GbE is Not Enough
Description
A
A
A
So
the
unique
thing
about
mellanox
is
we
completely
end-to-end
everything
from
the
network
adapters
to
the
switches,
the
cables,
the
silicon
in
all
those
three
and
at
the
speeds
of
10g
40g,
and
now
the
new
25g
50
g
and
Hunter
G
and
we're
shipping
all
of
these
products
today
so
completely
n2
in
from
10g
200g
hydrogen
network
adapters,
they
may
not
have
thought
a
right
yet
we're
shipping
them
all.
Today.
A
So
the
markets
that
we
have
traditionally
played
in
our
high-performance
computing-
that's
where
we
do
most
of
our
energy
focuses
on
where
we're
trying
to
maximum
increase
the
throughput,
reduce
latency
and
do
all
the
good
things
that
then
work
well
in
different
industries
such
as
high
performance
computing,
where
low
latency
and
high
bandwidth
is
extremely
important.
And
then
the
nice
thing
was
things
like
clouds
web
two
point:
oh,
they
all
kind
of
had
needed
needed,
similar
characteristics.
They
needed
to
fit
more
VMs
on
the
servers
that
they
need
to
further
pipes.
A
They
needed
lower
latency
that
so
that
they
can
move
VMs
around
very
fast.
They
needed
to
present
storage
at
low
latency,
so
the
high
frequency
traders
wanted,
you
know
low
latency
cards
and
and
switches
to
do.
Although
algo
trading
databases
things
like
the
Oracle
appliances,
exadata
exalogic,
ERP
systems,
teradata
is
all
these
sort
of
enterprise-grade
appliances
use
mellanox
at
the
back
end,
but
you've,
probably
never
heard
of
us
and
under
pending
all
that
has
always
been
storage.
A
A
A
So
the
key-
I
guess,
primers,
that
the
SEF
users
are
trying
to
achieve
our
high
throughput
and
high
I
ops
and
those
are
things
we
achieve
with
high
bandwidth
and
low
latency.
That's
how
we've
try
to
facilitate
those
two
parameters
and
these
sort
of
technologies
have
been
proven,
like
I,
said
in
the
high
performance
computing
industry
and
that's
what
we're
trying
to
bring
to
SEF.
A
A
We
try
to
simplify
your
infrastructure,
make
it
more
resilient
by
providing
fata
pipes
so
that
you
know
you
don't
see
network
related
issues
and
we
try
to
free
up
the
CPUs
that
you
purchased
to
do
to
do
other
work
from
doing
network
work
by
offloading
it
onto
our
nicks,
and
we've
got
some
other
uploads
coming
up
as
well.
That
I'll
talk
about.
A
A
So
this
is
a
simple
comparison.
You
know,
don't
ask
me
whether
it's
this
cpu
and
that
hard
drive,
but
the
principle
is
basically
one
geez
out
of
the
question.
10G
has
benefits
to
both
latency
I,
ops
and
throughput,
but
for
TG
clearly,
has
you
know
additional
benefits
one
this
is
showing
Seth
will
eat
up
all
the
bandwidth
that
you
give
it
and
will
perform
better
I.
Guess
that's
one
of
the
key
things.
A
So
you
know,
you
might
say,
I
don't
need
for
TG,
but
you
may
not
realize
what's
happening
at
the
back
end,
so
it
is
kind
of
eating
or
you
can
give
it
and
so
really
what
we're
seeing
here
is.
You
know
two
and
a
half
times
the
throughput,
with
40
G
over
ten
fifteen
percent,
higher
I
ops-
probably
just
using
hardest
in
this
particular
case-
and
you
know,
that's
that's
a
pretty
good
improvement
and
then
anytime
you
buy
mellanox
40g
switch
or
a
40g
Nick.
You
actually
get
56
g
for
free,
so
for
TG.
A
A
A
You
can
go,
look
it
up,
but
the
main
point
is
making
it
more
simple:
less
cables,
less
less
network
cards,
less
switching
to
achieve
greater
performance
and,
in
this
case,
basically
showing
that
the
three
lines
at
the
at
the
top
a
40g-
and
there
are
perhaps
approximately
two
times
the
read-through
put
to
the
tangi
infrastructure
and
half
the
latency.
And
again
you
don't
have
to
go
bonding,
multiple
10g
links
and
doing
all
that
sort
of
stuff.
A
I
won't
talk
about
this
too
much,
but
you've
already
heard
from
from
sandisk.
Basically,
what
we're
we're
doing
there
is
where
the
tangi
network
card-
that's
in
these
boxes
are
sorry
the
four
TG
network
card.
That's
in
these
boxes,
you
can
choose
your
own
switch,
but
hopefully
use
choose
a
mellanox
which
and
I'll
sort
of
give
you
reasons
to
to
why
you
might
want
to
do
that
and
there's
some
performance
figures
there
on
10
g
vs,
40
g.
A
So
till
pretty
much
recently,
we've
basically
had
three
basic
switches
in
the
offering
we
aren't
going
to
them
to
in
too
much
detail.
But
I
want
to
focus
on
the
small
one
and
you.
Firstly,
you'll
notice
very
low
power
consumption,
less
than
100
watts
for
any
of
these
switches,
and
these
are
basically
capable
of
running
56g,
full
and
full
line
rate
with
all
the
ports
running
at
all
times,
without
dropping
a
packet.
That's
because
the
the
silicon-silicon
capacity
has
the
ability
to
handle
all
the
switching
required
at
full
times.
A
So
I
said
till
recently,
because
now
we've
just
released
all
the
100
G
switches.
We
were
part
of
the
consortium
that
brought
25g
to
the
market
as
well,
so
now
you're
going
to
see
more
and
more
25g
Nick's,
which
we
see
as
a
more
cost-effective
way
to
building
out
a
network
than
10
G
you'll
see
50
G
Nick's
again,
not
yet
I
triple
e
ratified,
but
that
will
be
the
new
40
g
and
these
divide
very
nicely
into
a
hundred
g
top.
Iraq
switch.
A
So
the
other
important
thing
with
storage
and
latency
is
having
very
consistent
latency
and
we,
we
have
kind
of
a
very
deterministic
latency
across
all
packet
sizes,
extremely
low.
Compared
to
this
is
that
the
competitor,
which
is
a
tried
into
silicon,
which
is
pretty
much,
goes
into
every
other
switch,
whether
it's
extreme
or
a
wrist
or
brocade.
So
what
you
can
see
here
is,
depending
on
packet,
size
and
depending
on
how
much
your
switch
is
being
loaded.
A
A
So
what
we
have
here
is
one
of
our
best
sellers,
which
are
12
port
for
TG
switch.
Usually
carried
my
laptop
bag,
so
this
12.4
TG
switch
is
what's
shown
up
here:
it's
not
a
blade
enclosure
or
anything.
It's
a
rack.
Mountable
switch,
you
put
two
side
by
side
and
one
year
and
what
you
achieve
is
a
hitch,
a
solution
like
this
and
just
one
you
it's
a
it's
for
TG
ready,
which
it's
40.
A
In
other
cases,
if
you
have
just
10
G
servers,
you
can
use
it
at
40
G
later
or
you
can
uplink
to
your
your
line
at
ng.
And
what
we're
seeing
more
and
more
is
networking
for
storage
is
becoming
the
server
guys
domain.
It's
not
the
IT
guys
anymore,
so
the
server
guy
is
really
defining
I
need
a
you
know:
low
latency,
high
bandwidth.
A
It
doesn't
need
a
lot
of
features,
just
needs
to
kind
of
have
cut
through
performance,
and
so
it's
becoming
tightly
coupled
with
the
storage
and
again
low
latency
low
power,
I
mean
two
switches
doing
for
TG
at
100
watts.
Together,
you
typically
see
about
600
watts
with
some
competitors,
which
means,
if
you're
in
a
Colo
site
over
a
year,
you're
going
to
save
a
lot
of
money
as
well
as
space.
A
So
let's
assume
each
of
top
Iraq
has
two
of
those
switches.
Then
you
create
another
layer
on
top
for
the
aggregation
layer
where
each
of
these
racks
could
have
a
combination
of
self
and
compute
servers
where
it's
all
connected
in
a
non-blocking
network
with
a
che
at
the
root
level-
and
you
know
this
can
scale
up
to
a
certain
size
depending
on
your
blocking
ratio
and
things
in
layer
two
and
going
out
even
further.
A
So
the
next
thing
is:
how
do
we
reduce
latency
even
further
and
that's
through
RDMA
and
adi
amazed
technology?
It's
an
implementation
that
is
being
used
in
high-performance
computing
for
a
very
long
time.
The
whole
idea
is:
how
do
you
move
data
around
without
the
CPU
being
involved?
How
do
you
do
it
by?
A
How
do
you
not
do
it
over
TCP,
which
is
a
very
sort
of
fatty?
A
protocol
takes
a
lot
of
CPU
and
it
slows
you
down.
So
basically,
what
adi
adi
ma
does
is.
The
point
is
to
take
move
data
from
server
to
server,
be
the
memory
of
separation
of
the
memory
of
server
be
directly
without
talking
to
the
cpu
and
the
colonel.
So
what
we
have
here
is
traditionally
application
buffer
in
in
separate
talks
to
the
kernel
buffer
goes
to
the
hardware
across
TCP
and
backed
the
other
way.
So
with
rema.
A
You
simply
talk
directly
to
the
hardware
from
the
software,
the
application
buffer
and
not
over
TCP,
and
then
you
go
directly
to
the
other
side.
So
what
were
right
now?
We
have
beta
code
already
in
hammer,
so
you
might
see
2
2,
3,
3,
X
performance
and
by
the
way
RDMA
exists
in
our
lowest
end.
Tangie
Nick
to
our
highest
and
hydrogenic
is
just
always
a.
A
A
So
if
you
are
interested
in
this,
so
to
speak
up
and
show
your
your
your
needs
and
your
request
to
the
community
and
hopefully
that'll
sort
of
speed
up
the
process
and
I've
got
a
short
video
now
in
this
first
example,
using
40g
with
normal
TCP
and
then
40g
with
a
DMA
turned
on
your
eye
up
to
increase
about
forty
four
percent,
depending
on
number
cause
you're
using
less
till
you
in
this
example,
you
in
in
the
first
set
you're
using
less
cause
and
you're
achieving
more
I
ops.
In
the
next
example.
A
A
A
A
A
A
A
A
No,
so
so,
under
on
the
next
side,
auriemma
is
not
the
standard
we've
implemented
in
our
Nick's.
On
the
switch
side,
you
just
need
to
turn
on
data
center,
bridging
and
I
think
priority
flood
control
and
third
parties,
which
can
can
it's
kind
of
like
lossless
traffic.
That's
all
it
is
yes,
correct,
yeah.
A
So
yeah,
that's
my
summary
I
guess
the
point
was
10
G's,
not
enough!
Don't
go
putting
multiple
10
G's,
either
to
solve
a
problem,
take
to
step
up
to
forty
or
fifty
six,
as
in
our
case
that
you
get
for
free,
improve
increase
your
throughput,
I
ops
and,
as
I
said,
Honda
G
has
begun
as
well,
so
we're
now
shipping,
hydrogen
X
cables
and
switches
as
well
as
we
can
be.
We
can
be
found
in
appliances
like
the
sandisk
appliances
or
you
can
build
your
own.
A
We
have
reference
architectures,
and
the
other
thing
to
mention
is
that
new
connectors
for
range
of
cards
will
also
have
arranged
occurring.
Offloads
so
I
think
we're
implementing
to
the
four
methods
as
far
as
I
understand,
and
so
when
that
gets
switched
on
all
that
all
those
calculations
and
algorithms
will
be
offloaded
to
the
knick
as
well.
So
thank
you.
A
Any
questions,
so
how
do
you
may
is
is
rocky
that's
over
Ethernet,
yes,
so
rocky,
but
a
DMA
is
a
term
used
in
InfiniBand
in
general,
but
what
we
do
is
bring
a
DMA
to
ethernet
and
that's
a
DM
0
/
converged
ethernet,
which
is
known
as
rocky.
So
as
soon
as
all
that
rema
code
works,
it
doesn't
matter
whether
it's
a
fini
boundary
thur
night.
It's
the
same
thing.