►
Description
Content Routing Performance Goals - Juan Benet at IPFS thing 2022 - Content Routing 1: Performance - https://2022.ipfs-thing.io
A
So
now
I'm
gonna
give
a
quick
intro
to
content
running,
so
it's
easier
to
kind
of
think
about
the
content
writing
problem
in
for
us
in
terms
of
ipfs,
given
that
we,
a
lot
of
us,
are
really
familiar
with
it
think
of
the
traditional
HTTP
model
as
being
as
not
having
to
solve
a
Content
writing
problem
because
the
how
to
find
the
content
is
embedded
in
the
url
url
to
some
extent.
A
So
a
URL
like
example.com
has
a
a
name
system,
so
the
domain
gets
mapped
to
an
IP
address,
there's
routing
for
the
particular
computer.
So
there's
not
exactly
content
routing.
There
is
regular
IP
network
location
routing,
but
once
you
connect
to
those
machines,
those
machines
are
supposed
to
know
what
that
content
is,
and
they
can
tell
you
whether
you
know
they
can
either
return
that
content
to
you
or
give
you
an
error
now
in
reality,
there's
content
running
systems
underneath
those
those
machines.
A
So
when
you
think
of
large
Cloud
systems
within
those
whenever
you
get
to
those
machines-
and
you
request
a
resources-
something
like
that,
there
usually
is
some
form
of
content
routing,
but
that
usually
happens
entirely
within
one
Administration
domain
or
one
single
service,
which
means
that
there's
a
range
of
sophisticated
tools
that
they
use
within
one
mode
of
motor
control
to
then
provide
whatever
resource
you're.
Looking
for.
A
So,
if
you
think
of
like
an
S3
bucket
or
an
image
on
a
social
network
site
or
something
like
that,
there's
some
sophisticated
system
underneath,
where
the
machines
that
are
handling
your
request,
to
find
the
specific
object,
you're
looking
for
and
return
it
to
you.
But
the
problem
is
a
lot
easier
when
you
control
the
thing
end
to
end
right.
So
when
you
completely
control
that
system,
you
can
evolve
that
system
over
time.
A
You
can
scale
it
to
me
requirements
and
so
on,
and
you
don't
have
to
deal
with
having
to
write
a
protocol
that
a
lot
of
different
participants
handle.
So
the
content
writing
system.
So
the
content
writing
problem
emerges
when
you
decouple
the
location
addressing
in
the
web,
and
you
do
content
addressing
so
by
moving
to
content,
addressing
and
gaining
all
the
benefits
that
that
confers
like
the
being
able
to
Route
things
by
cryptographic,
hash
and
so
on.
Now.
A
So
the
the
content
writing
question
is
how
do
you
go
from
that
CID
to
the
set
of
participants
in
the
networks
that
have
the
content
and
do
so
in
a
decentralized
protocol,
so
think
of
it
kind
of
like
think
of
the
the
IP
routing,
World
and
and
the
whole
large
set
of
systems
that
are
designed
to
Route,
IP,
addresses
and
think
of
an
equivalent
system
emerging
for
routing
content,
addresses
and
routing
different
different
mapping,
the
locations
of
all
the
different
resources
and
being
able
to
to
route
your
particular
request
to
that
to
wherever
the
nearest
participants
are?
A
Sometimes
it's
also
not
just
about
near
participants.
Sometimes
you
have
to
deal
with
authorization,
requirements
or
authentication
requirements,
and
so
on.
So
it's
not
just
as
simple
as
finding
all
of
the
participants.
You
also
have
to
take
into
account
who
the
request
is
coming
from.
Are
they
authorized
to
view
this
resource
either
authorized
to
find
out
who
has
relevant
content
and
so
on?
So
there's
a
bunch
of
details
there
and
so
on.
A
The
good
news
on
all
this
is
that
it's
not
harder
than
IPA
routing,
so
IP
routing
is
of
a
pretty
hard
problem,
there's
an
enormous
set
of
protocols
and
massive
scale
systems
that
enable
IP
routing
content.
Writing
is
a
similar
scale
of
problem,
so
it's
totally
doable
it's
just
a
matter
of
getting
there
to
sort
of
talk
a
little
bit
about
scale
currently
in
the
in
the
Falcon
Network.
We.
This
is
the
broad
map
of
of
the
system.
A
Think
of
the
just
to
mention
scale
for
a
moment
we're
talking
about
already
hundreds
of
petabytes
of
data.
So
that's
a
lot
of
content
to
route.
If
you
think
of
yeah,
let's
do
a
quick,
calc,
so
100
petabytes.
A
A
Yep
so
so
you're
dealing
with
a
40
terabyte
set
of
Records
just
for
the
content
on
popcorn
today,
right
so
40,
terabytes
of
content,
routing
information
and
that's
sort
of
like
the
order
of
of
magnitude
that
we're
that
we're
dealing
with
so
one
other
component
here
is
when
you
think
about
cdns
and
you
think
about
the
layout
of
the
internet.
You
would
ideally
like
to
solve
the
content
writing
problem
as
close
to
the
requester
as
possible.
A
So
if
somebody's
in
think
of
the
big
internet
as
a
massive
Grapevine,
where
there's
all
kinds
of
different.
A
Domains
that
all
different
sub
networks
of
devices
and
your
requests
from
a
particular
computer
at
the
at
the
edge
is
being
routed
through
a
whole
set
of
machines.
You
ideally
want
to
solve
the
content,
writing
problem
as
close
to
the
user
as
possible
to
minimize
the
latency
of
returning
that
request.
A
So
if
you
can
get
to
10
sub
milliseconds
like
that
would
be
great
like
if
you
send
the
request
out
from
a
house
somewhere,
and
there
can
be
a
Content
routing
system
right
in
your
ISP,
you
can
resolve
a
lot
of
the
requests
right
there
and
redirect
the
user
to
wherever
the
content
is
without
having
to
go
all
the
way
to
sort
of
the
rest
of
the
network.
A
I
know
that
this
is
very
different
from
how
traditional
peer-to-peer
systems
have
solved
this
problem.
The
traditional
model
here
is
oh
route,
everything
through
a
DHC-
and
this
is
what
ipfs
has
been
doing
for
a
while,
but
when
we
do
that,
you
end
up
with
very
long
latencies
to
be
able
to
retrieve
a
particular
request,
because
you
are
having
to
hop
around
the
entire
internet
trying
to
find
who
has
a
particular
content.
This
you
end
up
with,
without
like
a
very
high
throughput
way
of
of
being
able
to
resolve
these
queries.
A
Just
for
a
sense
of
scale
like
this
is
the
sort
of
like
the
range
of
objects
that
are
getting
generated
by
various
different
applications,
and
you
know
this
is
three
years
ago.
You
can
imagine
this
continuing
to
grow
a
bunch
of
these
continuing
to
grow
on
some
exponential
graph,
and
so
this
is
the
number
of
like
uniquely
routable
objects
that
all
of
these
applications
are
generating.
A
So,
in
order
to
have
a
very
efficient
and
very
successful
content
routing
model
for
the
for
the
internet,
we
have
to
solve
a
problem
of
this
magnitude,
meaning
you
need
to
be
able
to
do
page
load,
quality
latency,
so
a
user
opening
a
web
page
entering
a
an
address
and
pressing
enter
or
entering
some
Search
terms
and
pressing
enter
and
be
able
to
render
requests
to
the
user
in
like
human
perceptible
time
scale.
So
that
means
you
and
most
have
on
the
order
of
like
500
milliseconds
before
it
starts
feeling
slow.
A
A
Now.
Of
course,
you
can
amortize
a
lot
of
this,
knowing
that
the
user
was
already
on
Twitter
or
knowing
the
user
has
already
been
seeing
tweets
from
the
same
sub
networks
or
whatever
all
of
that
can
kind
of
narrow
down
the
problem
space.
But
ideally
you
want
to
get
to
like
that
level
of
random
access
to
this
number
of
objects.
A
So
this
is
a
very
non-trivial
problem
right,
like
being
able
to
solve
something
at
this
scale
requires
treating
latency
very
seriously
when,
when
you
design
a
system,
so
you
don't
want
to
be
hopping
around
the
internet,
because
if
you
hop
around
the
internet
you're
now
dealing
with
100
to
300
millisecond
links
every
time
you
you
know,
send
a
message
out
to
somewhere
else
right.
A
So
if,
if
right
now
here
from
Iceland,
we
send
a
message
to
San
Francisco,
that's
a
150,
millisecond
or
200
millisecond
hop
by
the
time
we
hear
back,
I,
guess
200
seconds
there,
I'm
back
right,
roughly
by
the
time
we
hear
back,
we
maybe
get
like
one
more
of
those
before
we're
already
done,
so
we
can't
really
do
systems
that
require
many
hops
to
many
nodes.
A
To
be
able
to
to
to
route
so
this
kind
of
like
another
sense
of
scale
here,
this
kind
of
typical
Cloud
style
workloads,
I've
kind
of
estimated
in
the
past-
that
we,
the,
if
you
think
of
all
of
these
applications
and
you
project
their
growth
just
to
some
extent
being
able
to
handle
10
to
the
15
objects
or
10
to
the
18
objects,
is
roughly
where
you
want
to
be.
A
There
are
probably
not
that
many
objects
or
won't
be
that
many
objects
soon,
but
being
able
to
build
systems
that
can
handle
that
scale
is,
is
roughly
where
you
want
to
be
and
yeah
again
the
you
need
systems
that
can
handle
large
scales
and
do
so
with
very
low
latency,
so
that
to
me
that
very
much
constrains
the
problem
and
we'll
talk
about
this
more
later
in
the
constraint
section,
but
that
that
very
much
constrains
the
problem
to
having
to
replicate
a
lot
of
the
indexing
information
and
put
it
close
to
wherever
it's
going
to
be
requested.
A
So,
let's
maybe
talk
about
the
problem
a
little
bit
formally,
so
the
think
of
the
a
Content
writing
system,
that's
enabling
users
to
find
content
in
a
network.
So
the
the
fine
part
is
like
this
query
that
content
users
are
going
to
do,
and
so
that's
the
search
process
through
a
system.
It's
a
routing,
query
and
content.
Here
we
we're
going
to
use
the
IDS
to
to
map
all
that
in
lipidor,
P
terms.
That
means
finding
other
peers.
A
So
that
means
specific
public
key
addresses
that
you
can
then
Map
and
find
the
actual
IP
addresses,
or
everything
else
would
be
able
to
connect
to
them.
Ideally,
you
have
glue
records
there,
or
things
like
that.
That
can
give
you
the
information,
so
you
don't
have
to
do
additional
hops
to
to
find
those
fears.
A
A
They
map
it
in
terms
of
it's
a
tuple
of
the
the
sea
idea,
they're
providing
and
their
particular
peer
ID,
and
then
the
clients
have
the
ability
to
search,
like
the
fine
part
is
find
providers
for
a
particular
CID,
and
that
should
return
back
a
an
asynchronous
channel
of
multiple
purities
over
time,
because
you
won
the
search
process
to
start
returning
things
as
quickly
as
possible.
A
While
the
system
continues
to
look
for
more
potential
providers
because
you
might
find
a
set
of
providers,
but
they
might
not
give
you
the
content,
you
might
not
be
able
to
get
the
con
from
them,
you
might
try
reaching
them,
they
might
not
be
online,
they
might
not.
You
might
not
have
the
right
authorization,
they
might
not
want
to
interact
with
you
or
whatever.
A
So
you
need
a
system
that
can
there's
a
one
one-to-many
mapping
between
like
every
CID
there's
many
provide
possible
providers
by
the
way
note
that
it
doesn't
have
to
be
completely
consistent.
So
you
do
not
need
a
full
view
into
all
providers
in
the
world
and
you
don't
even
need
a
you.
Don't
need
a
lot
of
providers.
A
You
just
need
enough
providers
to
make
the
the
routing
query
successful,
so
this
kind
of
we
talked
in
the
past
also
about
kind
of
record
systems,
so
you
can
sort
of
create
a
record
which
includes
the
CID,
the
peer
ID
of
the
provider,
and
you
can
use
the
private
key
to
create
that
record.
A
That's
a
non-reputable
non-repeutable
record,
which
means
that
once
you
create
that
record,
there's
that
particular
peer
stated
that
you
know
if
there's
a
signed
record
with
a
CID
that
particular
peer
is
using
their
identity
to
declare
to
the
network
that
they
do
indeed
have
this
content,
which
is
tricky
for
any
content.
That
might
be.
A
Censorship
might
be
a
target
of
censorship
at
some
point,
because
that
means
that
you
can
get
find
these
signed
statements
across
the
network
that
particular
parties
had
particular
content
at
particular
moments
in
time.
We'll
talk
about
all
this
all
these
kind
of
properties
tomorrow,
but
you
know
these
kinds
of
things
can
sneak
in
in
there.
A
So
now,
there's
probably
a
there's
a
whole
set
of
properties
that
you
might
want
to
have
in
systems
like
this.
So
things
like
sorry
I
need
to
get
rid
of
this
animation
thing.
A
So
think
of
there
being
things
like
just
from
a
traditional
distributed
systems,
consistency,
availability
and
partition,
tolerance,
you
want
to
be
able
to
find
find
the
con
the
content
in
the
network.
You
want
to
be
able
to
have
high
availability.
You
want
the
system
to
work
through
Network
partitions,
so
a
user
trying
to
request
something
should
be
able
to
do
so
with
high
throughput,
even
if,
like
the
rest
of
the
internet,
is
getting
disconnected
from
them
or
whatever.
A
You
want
very
high
performance
in
terms
of
the
throughput
of
raw
records
that
you
would
want
to
be
able
to
to
to
handle,
and
you
you
have
search,
there's
some
efficiency
requirements
on
the
on
the
search
process
to
make
sure
that,
ideally,
that's
there
should
be
of
one
on
the
on
the
size
of
the
network.
A
Meeting
the
the
the
search
query
itself
should
not
grow
with
a
number
of
nodes
in
the
network
if
it
does
then
you're
in
a
really
bad
you're
in
a
bad
spot,
because
that
that
might
mean
that
additional
hops
are
going
to
be
introduced
in
the
search
and
that
each
of
those
hops,
if
they're
not
locally
in
one
region
of
the
world,
are
going
to
add.
You
know
50
to
100
milliseconds
per
hop
and
so
like
that
that'll
that
can
bust
your
entire
solution.
A
And
then
you
know,
there's
a
whole
bunch
of
we'll
talk
more
about
security
properties
like
there's
a
long
document
that
I've
been
working
on
that
has
tons
of
different
properties
related
to
the
security
of
the
of
these
systems.
We'll
talk
more
more
about
this
tomorrow,
so
just
to
kind
of
put
things
into
perspective.
A
This
is
roughly
sort
of
like
my
grading,
of
what
content
routing
through
a
DHC
roughly
looks
like
where
you
have
there's
pretty
good
scalability
in
some
senses,
so
roughly
at
DHC
is
like
okay
for
the
distributed
systems
model,
but
like
not
great.
This
is
why
it
was
a
good
starting
content
writing
system.
You
can
use
dhcs
when
you
have
a
small
amount
of
content,
but
they
quickly
get
that
in
terms
of
performance
of
the
search
right.
A
So
there's
like
triple
fire
thing
is
like
flagging
that
in
a
DHT
in
traditional
login
DHT
in
cademia
just
find
content
you're
going
to
have
to
take
login
hops
through
the
network
to
find
it,
and
if
that
that
routing
query
does
not
take
into
account
where
you
are
in
the
world
and
what
the
latency
of
your
your
to
other
peers
and
so
on
then
you're
going
to
end
up
waiting
multiple
seconds
to
request
something,
and
so
in
the
Tweet
example.
A
A
The
good
news
is
the
HDs
are
really
space
efficient,
meaning,
as
you
add,
more
nodes
into
the
network.
You
add
a
lot
more
capacity
and
you
can
deal
with
huge
numbers
of
of
of
Records.
So
so
that's
like
a
pretty
good.
The
ft
is
closer
score
fairly
well
on
some
security
properties,
things
like
resilience
and
permissionlessness,
so
anybody
can
join
ADHD
they're,
pretty
robust
to
certain
kinds
of
failure.
A
They
have
some
problems
with
things
like
Eclipse
attacks
and
so
on,
where
you
can
do
some
amount
of
censorship,
resistance,
sorry,
censorship
of
Records,
but
for
the
most
part,
dhcs
tend
to
be
used
because
they
have
reasonably
good
security
properties.
Another
problem,
though,
is
that
in
DHC
is,
you
do
not
have
reader
writer
privacy.
A
There's
been
many
approaches
to
try
and
imbue
dhcs
with
reader
writer
privacy,
but
in
general
it's
extremely
difficult
to
do
it
in
great
part,
because
whenever
you're
looking
things
up,
you
have
this
trade-off
between
wanting
to
terminate
lookup
as
fast
as
possible
for
the
user.
But
if
you
do,
then
the
network
learns
something
about
the
information
that
you're
requesting
who's,
requesting
it
from
where
and
so
on.
So
you
end
up
in
a
really
bad
spot.
A
A
Now
today,
we
don't
have
to
worry
about
the
bottom:
half
like
the
security
end
and
and
and
privacy,
and
so
on,
like
all
of
that
stuff,
it's
not
for
the
short
to
medium
term,
that's
more
for
the
long
term,
but
we
ideally
want
really
high
performance
systems
when
it
comes
to
the
yeah,
the
the
speeds
of
being
able
to
query
things
and
just
being
able
to
handle
a
certain
set
of
scale
of
of
Records
cool,
so
I
think
I'm
gonna
stop
here
and
yeah
move
on
to
the
to
the
next
talk,
any
any
quick
questions
that
I
can
maybe
answer
about
this
or
yeah
I
guess
two
questions.
A
A
Yes,
I
mean
we're
talking
super
generally
here,
so
you
could
have
many
different
kinds
of.
There
are
many
different
ways
of
like
thinking
about
these
kinds
of
these
networks.
A
lot
of
like
DHT,
like
systems,
have
this
model
where
all
of
the
participants
in
the
network
act
as
routers
as
well.
So
you
have
both
the
the
the
content
providers
and
the
content
consumers
both
acting
as
routers
in
the
network,
and
you
try
to
organize
that
into
into
a
system.
A
There
are
other
structures
that
distinctly
separate
the
the
parties
that
are
doing
the
content
routing
into
a
different
type
of
agent.
That
is
neither
doing
the
content
providing
or
the
content
consuming.
And
then
you
get
a
lot
more
flexibility
in
designing
things,
so
you
can
do
like
meaning
that
for
a
particular
content,
you
can
design
a
Content
writing
system
to
do
what
you
described
and
that
can
totally
work
that
could
totally
work.
A
A
Yep
other
examples
of
like
routing
systems
that
do
between
all
one
like
in
constant
time,
yeah
yeah.
So
so,
for
example,
what
would
happen
if
you,
you
know,
had
a
very
straightforward
think
of
it
kind
of
like
a
hash
table?
If
you,
if
you
can
enumerate
the
set
of
participants,
sorry
the
set
of
like
possible
routers
in
a
network,
and
you
know
their
identities,
you
could
assign
them
a
particular
part
of
a
key
space,
and
you
know
always
that
like
just
go
to
that
participant
for
for
it.
A
So
if
you
have
churn
in
the
in
the
participant
set,
then
you
have
to
like
deal
with
that.
But
if
you
can
enumerate
the
set
of
participants
and
store
that,
then
you
totally
can
do
it
right
and,
and
that
can
scale
pretty
well,
you
can
get
to
you
know
how
big
does
the
list
have
to
get
before
it
gets
unreasonable.
A
A
Yes,
exactly
yeah
and
and
yeah.
A
Sure,
well
I
mean
it
depends
on
again.
It
depends
on
the
design
of
the
system,
because
if
it's
not
open
membership,
then
you
can
get
asymptotic
performance
12-1.
A
If
you,
if
you
have
consistency
on
the
on
the
set
of
content
routers,
then
whenever
you're
doing
a
query,
you
know
precisely
the
set
of
content
matters
and
you
can
go
precisely
to
the
to
those
and
you
get
of
one
on
every
on
every
request,
but
in
enforcing
consistency
on
the
set
is
like
difficult
right.
However,
what
do
blockchains
do?
Blockchains,
enforce
consistency
on
Modern,
proof-of-stake,
blockchains
and
forced
consistency
on
all
the
on
all
the
consensus
nodes
right?
A
If
you're
going
to
be
a
consensus
participant,
you
have
a
permissionless
ish
way
of
joining
a
network
and
getting
promoted
to
participate
in
the
consensus,
and
once
you
do,
everybody
in
the
network
knows
that
you're
participating
in
the
consensus-
and
you
know
precisely
all
the
notes
and
so
on,
and
you
can
get
that
to
scale
to
hundreds
of
thousands
to
millions
of
nodes.
A
But
you
may
not
need
millions
of
content
writers
like
Earth's,
not
that
big,
so
you
you
might
be
able
to
deal
with,
because
the
number
of
content
routers
depends
greatly
on
the
latency
right.
So
you
want
to
be
able
to
to
serve
one
of
these
requests
with,
ideally
something
like
50
milliseconds,
and
so
that
means
that
you
just
need
enough
content
routers
to
be
able
to
deal
with.
You
know
a
lot
of
queries
in
a
particular
region.
A
You
know
you
need
enough
of
them
laid
out
everywhere
on
the
on
the
internet
to
be
able
to
serve.
You
know,
50
millisecond
level,
queries
and
then
you
need
full
replicas
of
everything
you
need
right
there
or
in
a
foolish
it
doesn't
have
to
be
with
full
with
high
consistency
it
could
be.
You
could
just
have
like
the
hot
part
of
the
content
right
that
usually
content.
Has
this
drop-off
rate,
where
a
tiny
fraction
of
the
content
is
requested.
The
most.