►
Description
What's going on with DHTs, hydras, indexers, and reframe?
A
A
You
probably
have
seen
the
filecoin
system
map
in
Juan's
previous
presentations.
If
you
look
closely
Network
indexer
Falls
right
at
the
core
of
this
map,
it
integrates
with
storage
providers
to
advertise
the
content
they
store
and
offers
a
simple
API
for
retriever
clients
to
find
that
data.
So
to
put
it
more
concretely,
given
a
sid,
Network
indexer
finds
providers
of
data
Associated
to
it,
along
with
the
list
of
protocols
over
which
the
data
could
be
retrieved.
So
it's
pretty
simple:
diving
a
bit
deeper,
here's
an
overview
of
data
flowing
in
network
indexer.
A
A
If
you
like,
of
the
changes
in
the
provided
content,
which
providers
are
ultimately
responsible
for
maintaining
having
this
historical
view,
allows
the
network
to
reason
about
things
like
accountability
in
comparison
with
the
fire
and
forget
approach,
for
example,
as
the
content
on
the
provider
size
gets
added
or
removed,
they
then
announced
the
changes
to
the
chain
of
advertisement
to
the
network
indexes
this.
This
announcement
is
sent
over
HTTP
or
gossip
sub.
A
Once
announcement
reaches
on
the
network
indexer,
the
node
then
passively
fetches
the
advertisement
chain,
along
with
its
multi-ash
entries,
it's
important
to
point
out
this
passive
interaction
here.
The
passive
interaction
allows
the
indexer
to
avoid
being
overwhelmed
in
a
case
where
there
are
a
lot
of
changes
in
the
advertisement
chain
while
enabling
it
to
collapse.
Multiple
fetch
calls
into
one
and
transport
unseen
advertisements
in
bulk
effectively.
This
is
important
when
it
comes
to
things
like
scaling
the
ingestion
of
advertisements.
A
On
the
retrieval
side,
the
interface
to
lookup
providers
is
pretty
simple.
It
is
a
HTTP
get
API
that,
given
a
sit
or
multihash,
it
retains
user
providers.
Along
with
this
thing,
we
call
it's
called
metadata
and
the
metadata
is
the
thing
that
captures
the
protocol
over
which
the
data
could
be
retrieved.
The
kind
of
metadata
you
see
out
there
right
now
is
largely
made
up
of
two
kinds,
which
is
bitsoft
and
Falcon
graph
sync.
However,
it
is
important
to
point
out
that
The
Primitives
are
there
for
any
custom,
reachable
protocol
to
be
engineered.
A
So,
for
example,
you
could
engineer
your
own
custom,
HTTP
API,
to
retrieve
information.
The
reason
this
is
important
is
because
it
is
designed
to
be
extensible
in
that
it
allows
the
retrieval
ecosystem
to
design
the
ritual
protocols
that
fits
the
fits
the
need,
rather
than
being
limited
to
parameters
of
just
one
design.
A
Looking
back
at
the
eight
past
eight
months
or
so
honestly,
it
is
incredible
to
see
the
progression
and
how
far
the
network
indexer
has
come
early.
This
year
there
was
about
30
chance
of
finding
content
on
the
network,
with
lookup
type
being
pretty
slow
in
April,
we'll
launch
the
official,
the
first
version
of
the
network
indexer
and
in
production
called
sit.contact
at
the
time.
This
was
the
only
production,
ready,
indexer
endpoint
around
the
same
time,
with
the
help
of
lotus
team.
A
We
also
integrated
the
indexing
protocol
into
Lotus
release
115,
which
allowed
the
search
providers
to
emit
advertisements
by
August,
see
that
contact
have
ingested
over
8
billion,
sits
with
great
impact
in
improving
lookup
success
for
Content,
hosted
by
services
like
nft
storage.
This
this
Improvement
in
lookup
success
was
from
about
four
percent
to
96
by
the
way,
if
you
haven't
seen
Allen
Stock
on
this
earlier
this
year
on
ipfs
think
I
highly
when
looking
at
them.
A
We
also
worked
really
closely
with
the
ipfs
stewards
to
integrate
Network
indexer
into
Hydro
boosters,
so
around
August
we
started
testing
with
the
test
environment
in
the
Hydra
boosters
fast
forward
to
today.
A
staggering
amounts
of
seats
have
been
ingested
by
the
network
indexer
at
the
rate
of
about
7
billion
per
week.
There
are
six
independent
partners
that
are
also
running
Network
indexers,
in
addition
to
sit.contact
and
production.
Hydro
boosters
are
fully
integrated
over
reframe
protocol,
which
is
amazing,
diving
a
bit
deeper
into
what
the
network
industry
looks
like
today
and
I've
prepared.
A
For
you,
an
overview
of
sit.contact
deployment
topology
at
this
front
sit
that
contact
heavily
uses
HTTP
caching,
so
over
95
of
requests
are
currently
served
from
cash.
We
receive,
on
average,
around
20
000
requests
per
second
picking
just
under
twice
that
amounts
about
40
000
per
second
peaking,
with
response
latency
of
around
80
milliseconds
requested
or
not
served
from
the
cache
are
then
sent
to
a
custom
built
load.
A
Balancer,
slash
aggregator
service,
which
we
call
index
star
index
star
is
built
to
scatter
requests
across
multiple
indexer
nodes,
gather
the
results
and
then
combine
them
depending
on
the
protocol
that
instantiated
the
request.
It
is
resilient
against
things
like
fluctuation
of
response
from
the
back
end
nodes,
as
well
as
down
partial
downtime
of
the
back
end
notes
this.
It
also
maintains
a
persistent
connection
to
each
of
the
nodes.
All
of
this,
combined
together,
allows
the
index
start
to
respond
to
queries
within
four
milliseconds.
A
This
is
this
is
something
that
we
are
pretty
happy
with
and
finally,
on
the
back
end,
we
have
independently
deployed
indexer
nodes
running
store
index
which
handle
the
ingestion
of
advertisements
as
well
as
handling
defined
requests.
The
thing
to
point
out
here
is
that
these
nodes
are
made
independently
Deployable
on
purpose.
This
is
to
allow
us
to
experiment
with
things
like
different
configuration
parameters,
as
well
as
different
backing
stores
and
help
evolve.
A
The
system
you
know
as
in
in
order
to
scale
the
independent
deployability
of
these
nodes,
coupled
with
index
star,
effectively,
gives
us
the
freedom
to
break
things
in
production,
which
is
always
extra
fun,
while
minimizing
the
impact
on
the
quality
service
and
end
users,
of
course,
which
is
fantastic
here.
We
have
a
rough
overview
of
the
cumulative
distribution
of
advertisements
and
entries
across
providers
by
distributions
we
specifically
mean
here
is
the
Deep
depth
of
advertisement
chain
and
the
number
of
entries
or
multi-ashes
that
are
contained
in
them.
A
We
see
a
small
number
of
providers
with
order
of
millions
long
advertisement
chain.
These
are
providers
like
NXT
search
for
example,
and
then
of
course,
there's
a
like
it's
not
it's
predictable
that
there's
a
long
tail
of
distribution,
for
you
know,
providers
with
smaller
lengths
and
a
number
of
advertisements.
A
Here
we
have
a
a
percentage
of
five
coin
deals
and
storage
providers
that
are
covered
by
network
indexer.
Just
under
30
of
500
storage
providers
are
now
advertising
their
content,
which
is
fantastic
with
around
quarter
of
five
coin,
deals
already
indexed
and
discovered
by
Network
indexer.
So
if
you
notice
it,
you
can
find
the
provider
and
download
the
data.
A
So
over
the
coming
quarters,
we
are
looking
to
significantly
increase
these
numbers,
but
it's
just
fantastic
to
see
the
uptake
in
the
growth
of
data
coverage,
I
thought
I'd
show
you
a
graph
of
find
request
Distribution
on
sit.contact
here
this.
This
is
the
hourly
distribution
at
find
request.
Overseas.Contact,
you
can
sort
of
guess
where
the
Hydra
boosters
were
integrated
with
network
indexer.
This
was
about
mid
September.
A
As
you
can
see,
the
rate
of
requests
that
we're
receiving
on
save.contact
sense
is
fluctuates
quite
a
bit.
Thanks
to
the
great
collaboration
that
we
had
with
the
ipfs
storage
theme,
we've
made
a
whole
bunch
of
optimizations
in
the
routing
mechanism
from
the
load
balancer
right
up
to
index
star
and
the
nodes
themselves
in
order
to
keep
this
lookup
latency
low
and
despite
fluctuation
of
requests,
which
is
fantastic.
A
That's
it
thank
you
so
much
for
listening.
Please
don't
hesitate
to
reach
out
to
us
on
the
usual
channels.
You
can
find
us
at
store
the
index
channel
in
filecon
stack.
B
B
So,
what's
the
is
regular
routing,
so
the
regular
button?
It's
basically
a
way
to
be
able
to
find
content
by
the
CID
or
to
provide
content
to
other
peers
to
be
able
to
find
that
content
eventually.
Also,
you
can
find
peers
for
their
peer
ID.
It's
also
a
useful
functionality
and
you
can
also
set
and
get
ipns
values
on
it
to
be
able
to
have
like
content
that
will
be
evolving
right
now.
The
only
routing
system
that
we
have
or
the
default
one
is
the
DHT
and
it's
a
it.
B
It
has
been
it's
basically
the
default
that
we
have
been.
We
have
on
ipfs
since
long
time
ago,
but
right
now,
maybe
we
want
to
delegate
that
routing
system
to
other
to
other
different
applications.
B
So
why
should
we
use
regular
routing
instead
of
ADHD?
So
mainly
you
want
to
use
that
for
to
to
do
some
experimentation.
If
you
want
to
figure
out
different
ways
to
to
look
routing
system,
how
to
store
cids
how
to
retrieve
them
in
in
different
ways,
you
can
use
that
delegated
routing
API.
Also,
if
you
want
to
have
responses
in
less
than
10
milliseconds,
you
cannot
do
that
on
the
DST
right
now,
because
it's
basically
impossible
because
you
will
need
to
do
some
several
hops
to
be
able
to
find
the
content.
B
Yes,
so
you
can
also
have
data
locality.
So,
like
a
CDN,
you
can
cache
the
the
records
nearby
the
users
to
make
possible
that
speeds,
and
also
you
can
connect
to
other
networks
like
the
filecoin.
So
you
can
retrieve
content
from
other
network
like
firecoin
in
a
that
is
something
that's
not
possible
right
now
with
the
DHT.
B
So
we
had
some
previous
work.
That's
called
peer
routing
for
for
the
DS,
ipfs
implementation
and
content
routing
the
problems
that
we
we
had
with
that
previous
work
of
Delegation
delegated
routing
is
that
it
was
super
dependent
on
the
Google
RPC
API
and
it
was
not
doing
only
content,
routing
but
caching,
providing
records
and
so
on.
So
what's
a
mix
of
different
concepts
and
also
was
you,
you
needed
the
DSD
to
make
that
delegated
routing
API
works,
so
the
things
that
we
have
been
building
here.
B
So
as
you
as
we'll
set
before,
we
have
the
indexer
integration,
so
basically
the
hydras
are
talking
with
indexers
to
be
able
to
retrieve
content
directly
for
the
indexer
instead
of
getting
another
methods
to
do
that,
and
also
indexers
are
talking
directly
to
the
wetways
to
get
faster,
that
routing
information
that
we
need.
B
This
is
the
our
actual
implementation
in
go
that
is
implemented
in
the
HTTP
API
and
you
can
use
it
if
you
want
to
do
your
experiments
or
to
implement
any
content
routing
that
you
need.
You
can
use
that
API
directly.
You
have
there
the
HTTP
server
and
client
implementations
and
it's
really
easy
to
use.
B
B
Also,
the
Hydra
boosters
are
sharing
content
from
the
indexers
using
the
DHT,
and
you
can
communicate
everything
together
with
the
delegated
routing,
and
some
guy
is
a
proof
of
concept
that
is
basically
a
delegated
routing
endpoint
that
is
implementing
it's
getting
the
records
from
different
other
routing
systems
like
the
DHT
and
the
store
the
index.
So
you
can
delay
delegate
the
routing
to
that
some
guy
and
ask
for
directions,
and
it
will
ask
to
different
routing
systems
to
find
the
content
that
you
want.
B
So
some
some
work
that
we
can
do
in
the
future,
so
different
ideas
that
we
can
Implement
using
content
routing
is
we
can
Implement
something
clients,
so
you
can
have
a
really
light
client
using
the
control,
routing
API
and
you
don't
have
like
a
super
heavy
payload
working
in
the
background
trying
to
get
peers
and
get
information
about
cids
and
everything
you
can
delegate
the
routing
to
other
other
system
and
remove
that
pay.
The
load
from
from
your
system
also
can
be
used
in
in
content.
B
B
Instead
of
trying
to
find
that
content
in
the
whole
universe
of
cids-
and
you
can
discover
routing
out
of
band
like
using
pops
up
or
any
other
method,
we
we
are
still
doing
some
research
on
it
and
also
you
can
do
crazy
ideas
like
Implement
your
content,
writing
or
where
other
peer-to-peer
Network
like
the
the
BitTorrent,
DHD
and
experiment
with
this
I'm.
C
Gus
I'm
a
ipfs
steward
a
Kubo,
maintainer
and
I
also
maintain
the
hydro
boosters,
which
are
an
important
part
of
the
DHT
yeah
so
overview
of
the
DHT.
It's
basically
a
decentralized
database
that
stores
records
about
who
has
what
data
so
like.
If
you
take
a
CID
and
you're
trying
to
find
who
provides
that
CID
you'll
extract
the
multi-hash
out
of
the
CID
and
and
look
it
up
in
the
DHT
and
it'll
return.
You
like
a
list
of
peer
IDs
of
of
people
who
might
have
it
once
you
have
that
prid.
C
You
can't
connect
to
a
purity.
You've
got
to
have
an
address
so
there's
another
lookup
in
the
DHT
to
map
a
pure
ID
to
a
set
of
addresses,
and
then
the
third
thing
that
the
DHT
stores
is
ipns
records,
which
ipns
is
a
like
a
mutability
layer
on
top
of
ipfs.
C
So
we
ran
some
analysis
a
couple
of
months
ago
on
the
DHT
to
find
out
like
what
the
geographic
distribution
and
the
autonomous
system
distribution
was
across
the
DHT.
So
when
we
ran
this,
we
found
around
40
000
DHT
servers
across
150
countries
and
2700
autonomous
systems,
which
was
this.
This,
like
autonomous
system
distribution,
was
way
better
than
we
thought
it
would
be.
So
that
was
that
was
quite
exciting.
C
C
We
did
this
by
just
complete,
like
looking
in
a
database
that
had
known
IP
address
ranges
from
from
cloud
providers
and
again
we
were
pleasantly
surprised
by
How
heterogeneous.
The
network
was
in
terms
of
cloud
providers.
C
Hydro
boosters,
which
have
been
mentioned
a
couple
times,
are
their
DHT
nodes
and
all
they
do
is
they
store
DHT
records
and
they
share
a
backing
cache,
but
they
they
they're
exposed
as
lots
of
peers
in
the
DHT
that
all
share
the
same
back
in
cash
and
the
the
rationale
for
this
is
that,
since
they
all
share
the
same
back
in
cash
you're,
much
like
you're,
very
likely
to
hit
a
hydro
node
when
you
query-
and
if
you
hit
one
earlier
in
your
content
resolution
time
will
be
a
lot
faster.
C
Current
likelihood
of
hitting
a
Hydra
node
during
a
DHT
query
is
somewhere
around
97
percent.
It
fluctuates
with
the
size
of
the
network
and,
as
mentioned
earlier
also
they
currently
act
as
a
bridge
to
the
indexer
network.
So
it
was.
It
was
kind
of
a
a
way
for
us
to
get
the
indexer
records
onto
the
DHT
without
having
to
push
code
changes
to
all
the
clients
in
the
DHT.
C
Hopefully,
in
the
long
run
we
plan
on
removing
that
and
in
the
in
the
long
long
run
we
don't
even
want
the
hydros
to
be
around
at
all
they're
sort
of
a
Technique.
We
use
right
now
to
make
content
resolution
faster,
so
we're
doing
a
lot
of
analysis
now
to
find
out
like
like
what
the
long-term
plan
is
for
these.
C
But
while
we
have
them,
they
produce
very
useful
metrics,
because
we
get
like
a
sort
of
a
bird's
eye
view
of
the
DHT,
and
this
is
a
graph
showing
like
the
traffic
that
the
hydras
serve
like
at
Peak.
They
get
around
a
hundred
thousand
requests
per
second,
which
is
very
high.
C
Another
important
component
of
the
DHT
is
the
the
client
implementation
that's
used,
so
Kubo
has
two
different
client
implementations.
One
of
them
is
like
a
standard
kadimlia
client
which
discovers
the
network
topology
as
a
queries
and
the
the
second
I
call
full
RT,
which
stands
for
full
routing
table.
So
this
crawls,
the
entire
DHC
Network
up
front
and
caches
all
the
nodes
in
the
DHT.
And
then,
when
you
do
a
DHT
query,
it
doesn't
need
to
ask
anybody
about
where
the
provider
record
will
be
and
those
without
doing
any
any
queries.
C
So
it's
much
much
faster,
but
the
downside
is,
you
have
to
do
a
crawl
up
front
which
can
take
like
five
to
ten
minutes
and
has
been
known
to
crash
home
routers
and
things
like
that.
But
it's
really
useful
for
servers
like
that
are
running
on
beefy
Hardware,
where
you
need
really
high
performance.
C
You
can
also
like,
because
you
know
the
destination
of
Records
when
you're,
adding
content
into
ipfs,
like
bulk
ads
of
lots
of
data,
can
be
really
really
fast,
with
the
with
the
full
RT
client.
Obviously
another
one
of
the
problems
is
you've
got
to
you.
Gotta
cache
every
node
in
the
network,
so
obviously
that
doesn't
scale
very
well
with
the
network,
size
and
yeah.
You
have
to
wait
many
minutes
before
you
can
even
start
using
the
DHT.
C
So
Molly
showed
this
graph
the
other
day,
but
this
is
a
graph
of
the
the
resolution
time
in
the
DHC
over
the
past
year.
So
we've
been
making
a
lot
of
improvements
to
it.
You
can
see
that
you
know
it's
gone
down
from
at
a
peak
of
like
1.3
seconds
on
this
graph
down
to
around
0.4
seconds
many
years
ago.
It
used
to
be
on
the
order
of
like
20
to
30
to
40
seconds,
but,
as
will
mentioned,
400
milliseconds
is
still
really
long
if
you're
loading
a
web
page.
C
So
that's
where
the
inductors
come
into
play
if
you're
interested
in
more
measurements
and
analyzes,
you
scan
this
QR
code
and
it'll
it'll
pull
up
this
slide
and
you
can
click
on
these
links.
There's
we
have
a
whole
team
of
people
that
spend
all
their
time
collecting
data
about
the
ipfs
network
and
writing
reports.
C
C
So
what
are
like
upcoming
work
on
the
DHD
we've
got:
we've
got
to
change
about
the
land
called
optimistic,
provide
which
is
going
to
dramatically
speed
up
the
time
it
takes
to
add
records
or
add
data
to
the
ipfs
network.
C
C
We've
got
another
project
to
upgrade
the
DHT
to
improve
its
privacy,
so
that,
if
you
double
hash
the
the
keys
in
the
DHT,
then
when
you
send
queries
to
other
nodes,
they
don't
necessarily
know
what
it
is,
that
you're
looking
for
we're
going
to
keep
doing
more
measurements
and
analyzes
and
writing
reports
and
we're
also
working
on
a
process
for
making
protocol
improvements
to
the
DHD.
So,
for
a
long
time
the
protocol
hasn't
changed
because
it's
very
difficult
to
upgrade
the
network.
C
So
we're
trying
to
figure
out
a
sort
of
process
we
can
put
in
place
to
upgrade
the
DHD,
so
we
can
start
making
some
progress
on
these
yeah.
Thank
you.