►
From YouTube: DHT Routing Table Health - Guillaume Michel
Description
This talk was given at IPFS Camp 2022 in Lisbon, Portugal.
A
The
DHT
one
is
one
of
the
principal
component
of
lip2p
and
apfs
in
order
to
find
content,
and
so
we've
studied
how
it
work
in
practice.
We've
measured
it
and
that's
where
we're
gonna
Dive
In.
A
So
first,
let
me
give
you
a
quick
introduction
of
what
is
the
DHT
what's
a
distributed
hedge
table,
so
it
is
basically
a
computer
overlay
Network,
which
means
that
node
are
connected
to
the
same
network
using
internet
and
the
Overland
is
not
participating
in
this
DHT
will
be
connected
to
each
other
on
top
of
the
internet,
and
so
the
abstraction
that
the
DHT
gives
out
is.
A
It
is
a
distributed
key
value
store
where
you
can
find
the
location
of
some
content
according
to
the
content
itself.
So
it
is
so
all
of
the
store.
A
So
all
of
the
data
is
stored
in
a
flat
key
space,
which
means
that
if
some
files
have
a
similar
similar
name
of
some
some
similarities,
there
will
be
no
similarity
to
the
location
where
they
are
actually
stored
in
the
DHD.
A
And
why
do
we
have
a
DHT?
What
are
the
trade-off
there?
So
if
we
have
node
in
a
network,
so
what
we
could
do
is
just
keep
track
of
every
single
node
in
this
network
so
that
we
can
reach
to
them
directly
and
it
would
be
fast,
but
then
it
doesn't
really
scale
because
we
have
to
keep
a
very
large
state.
A
Alternatively,
we
could
keep
track
of
only
a
small
amount
of
peers,
but
then,
if
we
want
to
look
up
for
some
content
in
this
network,
we
have
to
iterate
through
all
of
the
peers
and
it's
not
efficient.
So
the
trade-off
is
we
want
to
keep
track
of
login
of
peers
and
it
allows
us
to
efficiently
have
a
lookup
of
login
complexity,
which
means
that
we
have
login
hops
in
order
to
to
reach
the
content.
A
So
now
some
specificity,
so
ipfs
and
lip
P2P
implement
the
Academia
DHT.
So
we
use
a
key
space
of
256
bits,
so
each
node
in
the
network
has
its
own
peer
ID,
which
format
you
may
be
already
seen,
and
it
can
also
be
represented
as
a
256
bit,
so
it
leaves
in
disk
space
and
the
content
also
live
in
the
same
key
space
as
the
peer
IDs.
A
So
the
locality-
let's
say
in
this
key
space,
is
based
on
the
xor
distance
between
the
so
the
bits
the
DHT
itself
in
apfs
doesn't
store
content,
except
in
the
case
of
ipns,
but
it
stores
pointers
to
the
content,
which
means
that,
if
I'm
looking
for,
if
I
want
to
publish
a
file
to
the
DHC
or
to
ipfs,
then
the
file
is
going
to
stay
on
my
machine
and
I'm
just
going
to
advertise
that
I
have
this
file.
A
So
if
someone
is
looking
for
this
file
come
and
reach
out
to
me,
and
so
the
way,
let's
say
the
state,
so
how
we
keep
track
of
the
peer
is
organized.
Is
that
each
peers,
cheap,
so
sorts
the
the
other
peers?
They
know
by
logarithmet
core
distance
and
stores
them
it
one.
What
we
call
K
buckets
so
I'm
gonna
focus
a
bit
more
on
this.
A
So
now,
if
we
take
an
example,
so
this
is
an
an
existing
Network,
so
let's
say
it's
Academia
and
each
of
the
peers
there
are
represented,
so
the
peer
ID
are
yeah
just
bit
strings.
So
just
for
convenience,
it's
not
256
bits
for
an
identity,
but
just
four
bit,
but
you
get
the
ID,
then
we
have
one
new
peer
that
wants
to
join
the
network.
So
that's
the
one
in
green
and
what
it
needs
to
join
the
network
is
a
bootstrap
node.
A
So
it
means
that
if
you
want
to
join
the
club,
you
got
to
know
someone.
Otherwise
you
cannot
yeah
join
this
network,
which
is
obvious,
so
you
need
to
know
at
least
someone
and
so
what
we
can
represent.
So
that's
the
tree
with
the
sore
distance
I'm
gonna
get
to
this
later,
but
in
order
to
get
to
know
some
more
peers
in
the
network,
so
the
green
node,
the
new
one
needs
to
ask
and
the
other
peers
is
no
okay.
Can
you
give
me
some
more
peers?
A
A
So
in
this
case
you
can
ask
one
one:
zero
zero,
which
is
closer
to
the
green
node
and
then
so
iteratively,
the
the
new
node
is
going
to
search
for
itself,
so
ask
all
of
the
peers.
It
knows,
give
me
some
peers
that
are
close
to
me
and
eventually,
and
it
will
learn
to
know
some
new
peers,
and
so
that's
how
a
node
will
fill
in
its
routing
table.
So
it
keeps
track
of
all
of
the
visited
peers.
A
So
once
the
node
has
a
jungle
Network,
it
will
periodically
look
at
look
up
for
itself
in
order
to
keep
so
a
node
has
to
know
all
of
the
the
peer
IDs
that
are
in
the
closed
neighborhood.
So
here
it
will
be
the
nodes
that
are
around
like
in
Orange
when
you
perform
a
request
for
some
content
or
for
some
peers.
A
It
is
the
same
as
is
the
the
same
key
space,
the
so
the
request
is
iterative
and
is
resolved
similarly,
and
the
request
happens
concurrently,
which
means
that
you
don't
need
to
wait
for
the
answer
of
a
single
peer.
A
When
you
publish
some
provider
records
or
some
pointer
according
to
the
content
you
are
hosting
in
the
DHT,
you
will
push
it
to
20
peers.
So
the
20
Clauses
peers
to
this
content,
identifier,
and
so
each
bucket
gonna,
come
back
to
the
bucket
later,
is
capped
at
20
piers.
And
please
note
that
these
three
constant,
so
the
20
closest
period
to
the
requested
key,
the
the
replication
index
and
the
bucket
capacity
are
the
same
value
in
ipfs.
But
they
don't
necessarily
need
to
be
the
same
value.
A
So
those
are
three
distincts
constant
and
so
now,
if
we
come
back
to
the
key
space
and
the
key
bucket,
so
as
we
say,
a
node
can
learn
about
some
other
peers
and
it
will
sort
them
inside
buckets
and
so
for
will
will
give
IDs
to
the
bucket
in
order
to
just
name
them,
and
so
the
bucket
number
zero
is
for
the
bucket
that
are
far
away
in
the
short
distance
from
the
node
itself.
A
And
so
it
is
the
peer
that
share
common
prefix
of
zero
bits.
So
here,
in
this
case,
we'll
take
still
the
same
node
as
a
reference
node
and
the
nodes
that
are
far
away
from
from
this
node.
So
this
one.
If
so,
if
our
reference
node
were
to
learn
about
them,
it
would
put
them
in
the
bucket
number
zero
because
they
are
far
away
and
then
for
the
bucket
number
one.
So
that's
the
appear
that
are
closer
and
that
chair
prefix
of
length
one
and
it
would
place
them
into
bucket
number
one.
A
Then
we
go
on
bucket
number
two
number
three
and
as
we
get
closer
and
closer,
there
is
going
to
be
less
candidates.
So
what
we
expect
so
now
imagine
you
have
a
tree
of
depth,
not
four
but
256..
So
the
buckets
are
far
from
you
are
going
to
be
full
because
there
are
so
many
candidates
to
fall
in
this
bucket,
but
the
bucket
close
to
you
are
going
to
be
totally
empty
because
there's
just
no
one
that
has
this
prefix
and
so
that's
kind
of
what
you
would
expect.
A
And
so
something
that
is
also
important
in
this
cable
cat
is
how
do
you
select
the
peers?
So
it's
simple:
when
you
learn
about
a
new
peers
that
you
didn't
know
before
you
just
compute
in
which
bucket
it
would
belong.
If
there
is
some
place
left
in
it,
then
you
just
put
in
there,
but
if
the
bucket
is
already
full,
you
just
don't
take
it.
A
It's
on
the
waiting
list
and
then
regularly
the
the
the
peers
will
probe
all
of
the
peers
in
his
routing
table
and
if
they
fail
to
reply
they
will
get
evicted
and
that's
how
new
peers
can
get
inside
the
routing
tables,
and
so
that's
So.
A
Eventually
only
the
very
stable
node
are
going
to
be
in
there,
because
the
one
that
are
not
stable
are
gonna
get
evicted,
so
we
performed
on
some
measurement
and
we
use
the
nebulac
roller
to
crawl
the
network
and
it
has
technique
to
get
the
routing
table
of
every
single
peers
in
the
network.
So
that's
pretty
awesome,
and
so
we
took
some
data
was
earlier
in
April
this
year
and
yeah.
A
A
So
we
can
reconstruct
the
K
bucket
for
every
single
pier
and,
as
we
have
a
global
knowledge
of
the
network,
we
can
verify
if
some
peers
should
belong
to
some
buckets
but
they're
already
missing
from
there,
and
also
we
can
look
up
so
one
of
the
property
of
Academia
is
the
routing
is
sound.
Only
if
you
know
all
of
your
closest
neighbors
and
your
closest
neighbor
know
you,
because
if
your
closest
Neighbors
in
a
short
distance
do
not
know
you,
then
you
cannot
expect
anyone
to
route
to
you.
A
If
nobody
knows
you
the
network,
and
so
that's
what
we
wanted
to
verify.
So
the
first
graph
we
have
is
displaying
the
the
ratio
of
unreachable
peers.
So
it
means
that
we
took
all
the
routing
tables
and
we
tried
to
probe
each
one
of
the
peers,
and
so
that's
the
ratio
of
peers
that
were
in
the
routing
table,
but
that
were
not
reachable
at
the
time
of
the
crawl,
which
means
that
there
are
still
entries
they
are
in
the
routing
table,
but
they
are
useless.
We
don't
really
want
them
and
so
0.
A
yeah
16,
so
16
is
very
low.
So
it
means
that
yeah,
so
out
of
the
20
closest
peers.
It's
only
like
yeah
like
a
couple
of
peers
that
are
missing
and
we're
expecting
much
more
and
we
can
see
a
difference
between
the
bucket
with
the
low
ID.
So
back
at
zero
to
let's
say
eight
and
the
bucket
nine
plus
and
so
bucket
0
to
8
is
the
bucket
that
are
full.
A
They
have
many
candidates,
and
so
they
will
easily
replace
the
unreachable
piers
and
the
others
may
keep
the
debt
PFO
longer
because
they
don't
have
much
candidate
to
to
join
this
bucket
and
it
takes
more
time
to
yeah
to
clean
up
second
result.
So
that's
the
peer
distribution
in
the
bucket,
so
on
the
x-axis
you
have
the
bucket
ID,
so
it
means
that
bucket
zero
one
and
so
on
are
far
from
you
and
the
bucket.
So
we
have
up
to
bucket.
A
20
are
close
from
you,
so
we
can
see
that
the
bucket
up
to
number
eight
are
almost
always
full,
which
means
that
there
are
a
lot
of
candidates
there
and
yeah
so
that
they
are
capped.
There
will
be
some
more
and,
as
we
go
down,
we
can
see
the
exponential
decrease.
A
That
is
expected
because,
as
we
said
before,
we
expect
the
number
of
candidates
to
be
divided
by
two,
for
when
we
increase
the
bucket
ID
and
so
in
Orange,
we
can
see
the
missing
peers,
which
is
so
in
blue.
That's
the
number
of
pure
per
bucket
that
we
observed
from
the
routing
table
of
the
peers
and
in
Orange.
It's
the
appears
that
we're
not
in
this
routing
table,
but
that
we
observe
in
the
network.
A
A
So,
as
I
said,
you
have
to
know
your
neighbors
and
your
neighbor
have
to
know
you
in
order
for
the
routing
to
to
work,
and
so
the
result
we
got
is
that
50
sorry,
61
percent
of
the
peer
no
all
of
their
20
closes
here,
which
is
excellent
and
that
95
of
their
peers
no
at
least
18
out
of
the
20
peers.
And
it
is
not
a
critical
failure.
A
If
you
are
missing
one
of
the
peers
because
you,
in
order
to
be
discoverable,
you
need
at
least
one
peer
in
your
closed
neighborhood
to
know
you
in
order
to
be
accessible.
So
that's
again,
very
good
stats
and
in
fact
we
would
not
expect
to
have
20
Piers,
because
when
you
look
up
for
yourself,
so
you're
gonna
get
the
20
kilos
appear
to
your
own
identity
and
you're
going
to
be
included
in
this
identity.
So
in
fact
we
expect
each
node
to
no
more
than
19
Clauses
peers
plus
itself
to
itself.
A
B
A
So
yeah,
that's
the
the
good
news
we
have
and
then
something
a
little
bit
more
concerning
is
that
the
diversity
in
the
bucket,
which
means
the
number,
the
total
number
of
distinct
peer
IDs
in,
let's,
let's
say,
all
bucket
number
zero.
A
So
if
you
take
the
bucket
number
zero
of
all
of
the
peers-
and
you
count
the
number
of
distinct
peer
inside
these
unions
of
bucket,
it
tends
to
decrease
over
time
and
which
is
a
bit
bad
for
yeah,
just
load,
balancing
and
decentralization,
because
it
means
that
if
you
are
to
reach
a
peer,
that
is
on
the
other
side
of
the
network,
it's
going
to
be
only
a
small
set
of
peer
that
are
going
to
take
all
of
their
requests.
A
So
we
also
did
some
measurement
on
that.
So
that
was
testing
data
from
the
first
measurement.
So
we
can
see
that
the
number
of
peers
is
was
approximately
constant
and
what
we
got
is
so
when
we
so
yeah.
Let
me
explain
this
graph
so
when
we
take
the
bucket
number
zero
for
all
of
the
peers
and
count
the
distinct
number
of
peer
IDs
in
there
we
reached
these
numbers.
So
we
yeah
counted
for
each
week
and
we
have
number
week
number
zero
and
week
number
nine.
A
So
that
was
the
all
the
the
duration
of
the
study,
and
what
we
can
see
is
that
the
bucket
that
is
the
most
diverse,
is
bucket
number
nine,
which,
if
we
recall,
was
the
first
bucket
to
be
and
not
not
full,
and
so
we
would
expect
to
have
more
diversity
in
the
bucket
that
are
full
right,
because
there
are
more
folks
there.
A
But
the
thing
is
that
over
time
as
we
tend
to
keep
the
very
stable
peers
inside
our
bucket
with
a
lot
of
candidates,
we're
going
to
keep
the
very
stable
one
and
the
very
stable
one
are
going
to
end
up
in
every
peers,
bucket
number,
zero
and
one,
and
so
over
time.
We
will
get
rid
of
the
unstable
peers
and
only
accept
the
stable
one,
which
is
good.
Because
now
you
have
a
stable
network.
A
But
then
it
is
centralized
because
all
of
the
all
of
these
very
stable
Pier
start
to
get
lots
of
requests,
and
so
it
doesn't
scale
anymore.
And
so
what
we
can
see
here
is
that
the
diversity
is
decreasing
over
time
and
I,
don't
yeah.
Probably
it
will
stop
at
some
point,
but
the
worst
case
is
that
the
bucket
number
zero
is
centralized.
A
We
have
only
40
different
peers,
so
20
for
each
half
of
the
key
space,
and
that
would
be
just
terrible,
but
I
mean
yeah,
we're
not
that
there
yet,
and
so
that's
the
moving
average
of
the
of
the
difference.
So
we
can
see
so
the
diversity
in
bucket
number,
nine
in
green
is
always
higher
than
the
the
one
in
the
bucket
zero
and
one.
But
what
we
can
see
is
that
the
diversity
is
decreasing
much
faster
in
the
buckets
with
a
low
ID
than
in
Buckhead,
with
a
higher
ID
and
so
yeah.
A
Yeah
another,
maybe
an
improvement
that
we
could
have
is
that
currently,
when
an
ipfs
node
leaves
the
network,
so
you
shut
down
your
computer
or
you
just
kill
the
ipfs
or
Kubo
process,
you're
going
to
flush
everything,
and
when
you
open
up
the
computer
again,
you're
gonna
have
to
insert
yourself
again
in
the
DST
and
do
again
all
of
the
lookup
and
get
to
know
some
other
peers
and
what
could
be
good
for
a
centralization.
A
I
mean
to
come
yeah
to
fight
centralization
and
to
make
the
Buddha
process
faster
is
to
keep
a
state.
So
you,
when
you
shut
down
your
node,
you
just
write
your
routing
table
or
the
peer
list
on
your
disk
and
when
you
boot
up,
you
try
to
contact
the
same
peers
again
so
that
you
converge
much
faster
when
you
boot
up,
and
you
avoid
to
get
those
Central
bootstrap
nodes
directly
in
your
routing
table,
because
that's
the
most
popular
and
stable
one,
and
so
they
get
less
requests
and
we
have
a
faster
convergence.
A
And
now
I'm
gonna
give
an
explanation
on
this
graph.
So
actually
it's
a
very
new
graph
I
think
we
had
it
like
two
days
ago
and
the
explanation
we
have
why
we
can
see
so
many
clusters
and
so
many
colors,
and
it's
always
going
to
be
group
of
power
of
two.
So
if
we
can
see
the
largest,
let's
say
empty
space
that
we
get
is
between
the
left
and
the
right.
Okay,
so
that's
like
two
groups
and
then
we
can
see
another
big
line
between
the
up
and
the
down.
A
So
it's
another
line
and
then
we
can
split
each
one
of
this
quarter
into
so
that's
the
color
and
inside
each
of
the
color.
We
can
yeah
clearly
see
that
there
are
two
distinct
groups,
and
sometimes
you
can
even
see
four
distinct
groups
in
this
color,
so
each
subgroup
of
a
color
has
again
two
subgroups,
and
so
the
way
we
can
explain
this
is
the
the
larger.
Let's
say:
DMZ
or
empty
space
means
that,
for
instance,
on
the
left.
A
So
if
we
take
it
from
the
bottom,
let's
take
the
two
same
reference
nodes.
Again,
these
nodes
are
going
to
be
closer
together,
so
they
will
always
be
and
connected
together
in
form
a
cluster.
And
then,
when
we
look
to
the
closest
node
to
them,
it's
going
to
be
those
two
guys
and
it
is
bijective.
So
those
two
guys
are
going
to
be
the
closest
to
this
one
and
so
they're,
going
to
form
a
cluster
yep
go
ahead.
C
B
A
Can
place
it
according
to
the
color,
so
let's
say,
for
instance,
if
we
Define
some
arbitrary
prefix,
the
left
is
going
to
be
zero
and
the
right
is
going
to
be
one
okay
and
then
upper
left
is
maybe
zero.
Zero
got
it
and
then
down
left
is
going
to
be
zero
one.
Okay,
so
there's
three
bits
plotted.
A
C
What
do
you
mean?
The
origin
I
see
a
bunch
of
dots
on
the
screen
with
different
colors,
and
some
of
them
are
farther
from
the
origin
of
that
plot
than
other
ones.
So
what
determines
whether
a
node
is
plotted
farther
from
the
origin
or
closer
the
numerical
value
associated
with
the
peer
ID
or
what
I.
C
A
C
C
A
Can't
say
exactly
for
the
location,
so
why
is
it
left
or
right
or
more
on
the
outside
or
on
the
inside
of
the
thing?
But
so
the
explanation
is
more
for
the
cluster.
So
how
are
nodes
grouped
together,
yeah
and
yeah,
so
so
to
conclude
this
presentation,
so
we
say
that
the
the
DHT
is
in
a
very
good
shape.
So
we
have
so.
A
We
know
all
of
the
peers
that
are
close
to
us
and
they
are
a
really
low
rate
of
debt
peers
inside
the
network,
so
stale
and
trees,
the
distribution
inside
the
K
buckets
is
as
expected.
So
we
can
see
the
exponential
decrease
as
the
bucket
index
increases
and
we
have-
and
yes,
so
the
only
concerning
thing,
maybe
is
that
we
lose
diversity
over
time.
But
there
are
some
some
solution
to
to
tackle
this
and
it's
not
critical
at
the
moment,
all
right,
so
that
was
it.