►
Description
This talk was given at IPFS Camp 2022 in Lisbon, Portugal.
A
A
A
A
You
just
have
when
you
launch
ipfs
client,
you
connect
to
some
peers,
it's
usually
around
300
and
then,
if
you
want
to
get
a
CID,
you
just
send
a
request
to
every
single
peer
via
bit,
swap
if
they
have
the
file
they
give
it
to
you.
If
they
don't
well,
they
don't
it's
pretty
fast
and
simple.
However,
it's
expensive
and
the
main
problem
is
that
it's
not
guaranteed
to
find
the
specific
file.
A
So
even
if
the
file
is
in
the
network,
but
none
of
your
peers
has
the
has
the
file
you
will
not
find
it.
So.
The
second
solution
is
that
this
should
be
the
touch
table
which
is
slightly
slower.
However,
it's
still
efficient
because
you
can
you're
guaranteed
to
find
the
content
if
it's
in
the
network-
and
you
can
find
the
specific
content
within
log
and
steps
where
n
is
the
size
of
the
of
the
network,
we're
going
to
focus
in
the
stock
only
about
on
the
on
the
DHD.
A
So
one
more
introduction
to
the
DHT
very
simple:
when
you,
when
you
join
the
ipfs,
the
ipfs
network,
he
will
generate
a
public
private
keeper.
A
Then
you
can
hash
this
key,
and
this
key
hash
of
the
key
will
be
your
identifier
in
the
ipfs
network
and
will
basically
determine
your
positioning
on
the
shot,
56
hash
space
of
the
of
the
network,
and
if
everyone
does
it,
everyone
will
generate
a
random
key.
So
we
have
kind
of
perfectly
uniform
or
equally
distributed.
Half
space
and
the
DHT
allows
us
to
go
to
a
specific
place
in
the
DHD.
A
So
I
can
say:
hi
I
want
to
go
to
this
hash
in
the
in
the
half
space,
and
the
HTT
provides
this
routing
mechanism
that,
by
conducting
at
most
login
nodes,
you
just
go
to
the
specific
region
and
you're
guaranteed
to
find
the
peers
in
the
network
that
are
closest
to
the
specific
place
on
the
hubspace
and
ipfs
uses
this
for
this
content
resolution.
So
now,
if
I'm,
a
provider
and
I
would
like
to
find
as
a
and
I
would
like
to
tell
the
world
that
I
host
a
specific
file,
I
create
the
CID.
A
The
CID
also
is
placed
somewhere
on
this
chart.
56
half
space,
I'll
I
will
use
the
DTC
routing
to
go
to
this
hard
space.
I'll
discover
K
closest
notes,
the
specific
CID
in
ipfs.
It's
20
here
are
using
four
because
it's
easier
and
basically
I
will
then
contact
all
those
K
nodes
and
we'll
tell
them
hey,
I'm
the
provider
for
this
file
and
then,
if
there's
another
provider
for
the
same
file,
it
will
do
the
same
thing.
A
If
everyone
does
this
now,
those
K
closes
notes
the
specific
City
you
will
just
Host
this
mapping,
the
CID
and
the
list
of
providers.
And
now,
if
someone
wants
to
get
the
file
again,
we
have
the
CID.
We
know
where
it
is.
In
the
house
space,
we
discovered
the
same,
quick,
close
this
nodes.
We
ask
them
hey
who
has
this
file
and
hopefully
we'll
receive
a
list
of
the
providers,
and
then
we
can
just
having
this
information.
A
We
can
just
contact
the
provider
directly
get
the
file
okay,
so
ipf
is
an
open
system
which
is
great
but
also
comes
with
some
challenges.
So
now,
if
I'm,
an
attacker,
I
can
create
some
civil
identities
and
those
identities
again
will
be
placed
based
on
their
public
private
keeper.
But
I
can
just
keep
producing
public
keys,
so
they
they
shoot
my
needs
and
it's
actually
pretty
easy
to
just
generate
Brute
Force,
basically
in
private
keys,
so
that
their
hashes
will
end
up
close
to
the
topic
hash.
A
And
if
the
attack
succeeds,
the
attacker
can
basically
play
the
Civil
identities
close
to
the
topic
hash
and
if
that
works
well,
now
K
closest
nodes
are
controlled
by
the
attacker,
and
this
is
problematic,
because
now
every
single
time
there
is
a
provider,
you
want
to
say:
hey
I,
have
this
file.
What
will
talk
uniquely
to
this
to
those
malicious
nodes
and
they
can
simply
drop
this
information?
And
if
someone
wants
to
now
retrieve
this
file
again,
it
will
discover
calculus
nodes
which
now
are
malicious
and
they
can
just
say
the
file
is
not
found.
A
This
is
even
more
problematic
because,
even
if
the
file
is
very
popular
and
there
are
thousands
of
nodes
hosting
this
file
as
long
as
I'm
able
to
Eclipse
those
K
nodes,
the
file
becomes
unavailable
in
the
network
and,
if
I
place
those
symbols
right
away
before
the
file,
the
providers
advertise
them.
The
attack
is
instantaneous,
so
it
works
right
away.
A
If
not,
if
I
place
my
symbols
before
after
the
the
file
was
advertised,
it
takes
some
time
because
we've
seen
from
the
previous
talk
that
honest
nodes
will
keep
the
records
for
24
hours.
So
basically
after
24
hours,
only
the
malicious
notes
will
have
the
information
who
has
the
specific
file
and
the
file
gets
eclipsed
right.
So
this
is
problematic
because,
with
a
single
laptop
generating
those
20
malicious
peers
takes
around
half
a
minute
on
this
laptop,
which
is
not
great
because
with
a
single
laptop
I
can
basically
Eclipse
any
file
on
ipfs
network.
A
So
we
were
wondering
how
we
can
detect
this
file
detect
this
attack
first.
A
So
if
we
have
a
perfect
random
keys
and
only
on
this
nodes,
they
will
be
uniformly
distributed
on
the
half
space
and,
however,
if
an
attacker
wants
to
launch
an
attack,
it
will
has
to
place
the
Civil
identities
closer
to
the
specific
topic
hash.
And
if
that
happens
well,
we'll
see
the
distribution
will
be
different,
because
we
have
this
like
region
near
the
CID
that
becomes
more
dense
and
the
distances
between
the
CID.
The
peer
IDs
will
be
much
shorter,
so
obviously,
as
a
node,
we
don't
have
a
global
view
of
the
network.
A
A
You
don't
have
to
dive
into
the
map,
but
basically
this
is
something
that
I
can
give
two
distributions
and
it
will
give
me
a
number
which
is
a
distance
between
those
distributions.
If
the
title
Divergence
is
large,
it
means
that
those
two
distributions
are
very
different.
If
it's
low
it
means
that
they're,
basically
the
same
so
what
we
do.
We
will
need,
first
of
all,
to
estimate
the
network
size
this.
A
We
can
do
pretty
securely
and
then
calculate
the
ideal
distribution
that
we
expect
to
see
if
we
have
a
uniform
distribution
of
the
peer
IDs
and
then
we
go
to
our
specific
CD,
and
then
we
compare
the
ideal
distribution
with
the
one
that
we
perceived
here
on
this
graph.
You
can
see
that
those
are
the
red
dots.
Are
our
work
store
at
the
cad
that
is
eclipsed
and
blue
nodes?
A
Are
our
walks
towards
that's
not
eclipsed,
and
on
the
y-axis
you
have
the
Kyle
KL
Divergence,
so
you
can
see
that
the
red
dots
are
basically
on
the
top
part
of
the
graph
and
the
blue
ones
are
on
the
on
the
bottom.
So
if
you
set
the
threshold
correctly,
we
can
basically
tell
whether
there
is
an
attack
or
not,
and
with
some
tunic
we
were
able
to
get
false
negative
rate
of
zero
percent
and
false
positives
only
on
around
one
percent.
A
So
there's,
basically
the
tuning
is
only
about
setting
this
threshold,
so
you
will
see
you
know
from
because
we
have
to
say
Okay
above
the
the
kale
diverges
higher
than
whatever,
but
you
can
basically
do
it
empirically
and
that's
based
only
on
the
network
size.
A
Right
so
we
know
that
there
is
an
attack,
but
what
we
can
do
about
it.
So
this
is
still
work
in
progress,
but
basically
would
like
to
solve
it
when
advertising
the
file.
Only
so
now,
if
I
have
a
file
I
advertise,
it
I
go
to
the
CID
I
check
the
distribution.
The
description
seems
off,
so
what
can
I
can
do?
A
The
good
thing
about
it
is
that,
on
the
on
the
Searcher
side,
we
don't
have
to
do
anything,
because,
while
we
go
towards
the
specific
CID,
we
should
be
Crossing
those
honest
nodes,
and
they
will
just
tell
me
you
know
if
you
want
to
go
closer.
That's
fine,
but
I.
Have
this
information.
I
can
give
it
to
you.
You
don't
have
to
go
towards
this
eclipsed
region
in
the
network
and
yeah.
I
think
this
is
it
for
me.
A
So,
basically,
what
we're
doing
we're
not
now
confirming
that
every
single
time
when
you
are
looking
for
a
file,
you
will
go
through
those
honest
notes.
If
that
works,
we're
basically
good
to
go.
My
colleague
Naveen
will
give
a
demo
tomorrow
about
that,
so
he
runs
eclipsing
cids
as
a
service.
If
you
want
a
file
to
be
Eclipse,
just
let
us
know.
A
A
Right
I
think
it
also
depends
on
the
we
are
not
still
set
on
the
on
the
actual
response
and
I.
Think
then
you
know
if
the
response
is
more
harsh
you
might
want.
Maybe
you
know
you'll
care
more
about
false
negatives
rather
than
false
positives,
and
vice
versa,
depending
on
what
you
do
after
the
the
yeah
exactly.
D
Hi
good
talk,
there
was
one
slide
where
you
showed
that
the
false
negatives
was
zero
percent.
Is
that
probably
always
zero,
or
was
that
empirical
data.
D
Get
so
is
there
zero
false
negatives
empirical
or
will
it
always
be
zero.
D
E
A
This
is
this
was
like
we
said
the
threshold
and
with
this
the
threshold
that
we
chose,
that's
the
result
we
got,
but,
as
I
said
this
threshold,
we
kind
of
tune
it
because
we
still
don't
know
whether
we
care
more
about
false
positives
or
false
negatives,
because
that
depends
on
the
exact
response
that
you
want
to
have
to
this.
To
this
attack.
F
Hello,
thank
you,
so
you
detect
this
using
the
klw
Divergence
on
the
distributions,
but
it
sounds
to
me
like
you
need
a
crawl
of
the
entire
network
to
detect
this.
A
No,
so
we
do
it
on
the
distribution
as
it
is
perceived
by
while
going
to
the
to
the
specific
CID.
So
here
on
this
graph,
you
can
see
that
this
is
a
distribution
of
the
peers
I
perceive
while
going
to
the
CID.
So
those
are
only
the
purest
eye
eye
contact,
so
I
don't
need
the
global
view
of
the
network
exactly.
A
G
When
you
go
back
to
the
previous
graph,
it's
this
one
yeah
yeah
that
that
one.
Why
are
these
horizontal
lines?
Could
you
explain?
That's
in
this
KL
Divergence,
horizontal.
A
Is
the
threshold
that
we
set
basically
saying
above
is
yeah.
G
A
E
That's
very
interesting
and
promising
result,
and
so
I
think
so
what
you
have
now
is
you
can
detect
an
attack
right
and
then
there's
a
mitigation,
but
I
think
it
would
be
even
possible
to
do
something
better.
So
when
you
provide
your
keys,
so
you're
going
to
do
a
lookup
for
the
the
spot,
where
you
want
to
store
your
keys.
D
E
Here
you
can
detect
the
density
and
if,
at
this
moment
you
detect
an
attack,
then
you
publish
it
to
a
lot
of
peers
right
because
there's
going
to
be
the
attacker.
But
you
want
to
give
it
to
the
honest
peers
that
were
there
from
the
start
right,
and
so
you
don't
need
to
constantly
monitor
right
right.
A
There
is
always
a
trade-off
because,
obviously
the
more
people
you
you
give
your
information
to
the
better
like
it's
more
secure,
but
at
the
same
time,
well
I
think
we
see
the
overhead
of
the
network,
we're
also
thinking
on
storing
those
records
on
the
path,
because
currently
you
are
using
kind
of
find,
find
node
operation
to
go
to
the
to
discover
the
case
closest
nodes,
and
then
you
give
them
give
them
the
information.
So
we
use
ad
provider.
However,
it
would
be
asking
everyone
on
the
path
towards
the
CID,
so.
E
A
Right
right
but
I
mean
you
still
go
towards
the
specific
CID
right,
so
you
still
give
it
to
those
like
okay
closest
nodes,
but
then
the
closer
you
get
and
still
we
have
to
model
that,
but
especially
for
popular
files.
It
should
give
us
enough
information
spread
across
the
network.
That
I
mean
the
good
the
good
thing
about
it
is
that,
then
you
don't
have
any
additional
overhead,
because
you
just
talk
to
the
nodes
that
you
contact
anyway,
yeah.
E
Yeah
but
I
think
so
if
you
do
first
the
lookup
to
store
the
the
provider
record
and
you
detect
that
there
is
an
anomaly,
then
you
can
look
up
for
some
more
peer
until
you're
satisfied
with
the
the
number
and
then
you
so
you
allocate
them.
So
you
allocate
more
provider
record
than
actually
20.,
and
so
maybe
it
can
be
combined
with
optimistic
provide,
and
so
you
so
the
the
number
of
Provider
records
depend
on
the
density
of
the
of
this
specific
location
of
the
key
space
and
I.
E
G
A
Yeah,
it
doesn't
have
to
be
extremely
accurate
because
again
that
depends
on
the
threshold.
So
there
are
a
lot
of
moving
Parts,
but
we've
run
run
it
on
multiple
traces
acquired
by
the
by
the
crawler
that
I
guess
you
wrote,
and
so
we
basically
go
to
a
random
key
in
the
network
and
based
on
the
distribution.
You
know
we're
able
to
estimate
the
network
size
and
we
were
off
by
around
2-3
percent.
From
from
all
the
runs
that
we
that
we
had.
F
As
another
idea
to
avoid
relying
on
the
network
size
you,
so
what
you're
doing
when
you
insert
a
provider
record,
is
that
you
explore
the
region
close
to
that
Cid
in
the
airspace.
And
you
already
know
the
region
close
to
your
own
ID
in
the
headspace,
and
you
could
compare
those
distributions
and
then
you
don't
need
to
rely
on
the
network
size,
because
you
know
the
density
in
your
proximity.
Maybe.
A
You
probably
also
have
because
I
think
now
we
looked
so
here
we
were
kind
of
looking
globally.
It
would
be
good
to
check
whether,
if
you're
a
lot
closer
to
the
to
the
CID,
it's
more
difficult
to
detect
the
attack.
H
Hi,
thanks
for
the
talk
I
was
just
here
wondering
what
is
the
real
cost
to
set
up
this
attack?
H
Okay,
so
if
I
understood
correctly,
you
can
Eclipse
any
CID
if
you
have
20
cables
that
are
the
closest
in
the
network
with
that
identifier,
but
identifiers
are
derived
from
public
key
pairs
which
are
verified
whenever
you
interact
with
the
nodes
to
establish
the
secret
connection,
which
means
that
in
practice
you
have
to
have
a
huge
database
of
public
private,
key
Pairs
and
peer
IDs
such
that
you
can
fetch
20,
which
are
closest
to
your
target.
Is
that
it
right.
D
A
But
I
still
have
so
I
have
both
the
private
and
public
key
right,
like
I,
said,
generate
prior
private
key.
So
basically,
what
we
do.
We
just
fetch
the
closest
node
to
the
specific
CID
and
then
just
Generate
random
key
I
check.
If
it's
like
closer
to
the
CID,
so
I
don't
have
like
both
private
and
public
keys.
I
can
establish
the.
H
Maybe
that's
what
we're
going
through
tomorrow.
We
can,
because
we
have
like
a
hydrant
node
that
has
a
huge
database
of
public
private
keys
and
peer
IDs
and
then
I
decide.
Okay,
I
will
Eclipse
everything
around
this
identifier.
I
pick
up
the
right
identities.
I
make
my
bed
Hydra
present
itself
to
the
network
with
that
identity
then,
and
we
killed
that
segment
of
the
network.
A
But
well:
okay,
I
think
you
know
so
now
the
you
wanna
you
wanna,
take
it.
A
I
Will
see
tomorrow,
but
within
half
a
minute
to
a
minute
we
have
a
functioning
attack.
So
it's
it's
very
easy.
Basically
yeah
the.
A
B
Okay,
hi
hi.
Thank
you
for
the
talk.
I
was
wondering
you.
You
were
saying
that
you,
your
idea,
was
to
mitigate
this
was
to
store
the
CID
on
the
pad
right
yeah.
A
The
further
away
you
are
from
the
CID,
the
less
determined
next
stick.
It
is
because
you're
kind
of
guaranteed
in
the
buckets
close
to
the
specific
CID
that
you
will
have
more
or
less
the
same
nodes
and
the
further
away
you
get
the
less
deterministic
is
it.
Although,
with
the
results
from
one
of
the
talks,
it
seems
that
even
for
the
buckets
far
away,
we
might
be
getting
something
deterministic
because
of
this
stable
nodes.
We
actually
didn't
take
it
into
account.
A
So
that's
that's
something
very
interesting
to
to
also
see
but
yeah
the.
So,
basically,
if
we
do
this
kind
of
on
the
path
approach,
the
further
away
you
are
from
the
CID,
the
more
secure.
Is
it
because
it's
much
more
difficult
for
the
attacker
to
kind
of
eclipse?
You
know
a
larger
bucket,
but
at
the
same
time
you're
also
less
likely
to
get
the
information.
So
it
will
work
so
the
further
away
you
go,
it
will
work
only
for
them
very.
B
D
Nice
yeah.
F
A
Right
so
I
mean
there
are,
you
might
think,
yeah
kind
of
good
use
cases
for
this,
because
if
there's
some
harmful
content
we
can
say
you
know
hey
like
let's
Eclipse
it.
However,
the
problem
is
that
you're
giving
a
gun
to
kind
of
anyone
who
wants
it.
This
is
probably
not
a
good
idea,
however.
I
don't
know.
Maybe
we
could
we're
thinking
about
you
know
kind
of
there
can
be
a
list
of
files
that
you
don't
want
to
participate
in
sharing,
and
then
you
just
you
know,
don't
do
it
yourself,
but
I
think
actively.
I
I
That,
more
generally
but
okay,
excellent
okay,
one
more.
E
So,
just
to
follow
up
after
what
draw
a
thing
and
how
difficult
is
it
to
create
an
actual
Eclipse
attack,
so
I
think
there's
a
protection
in
the
code
that
so
you
cannot
have
more
than
three
IP
addresses
in
your
whole
routing
table
from
the
same
ASP
or
like
the
same
IP
block
and
two
inside
the
same
bucket.
So
it
means
that
to
have
a
natural
Eclipse
attack,
you'd
need
at
least
10
IP
addresses
right
so
yeah,
even
even
if
generating
the
keys
is
easy.
A
I'm
not
sure
so
I'm
not
sure
because,
like
I
know
that
there
is
this
limit
of
the
IPS
per
bucket,
but
I
don't
think
you
have
to
add
all
the
appearance
that
you
discovered
during
an
operation
to
your
routing
table,
so
you
can
still
use
them.
You
know
if
you
discover
okay,
closest
notes
to
the
CD,
you
don't
necessarily
have
to
add
them
to
your
to
your
routing
table
to
to
talk
to
them.
E
Yeah,
but
what
I
mean
is
you
cannot
replace
so
because
the
nodes
are
going
to
Eclipse
like
a
couple
of
notes,
so
you
get
closer
and
you
try
to
remove
the
nodes
that
know
about
the
target.
Cid.
A
E
A
D
I
D
Nice
any
final
question,
or
should
we
just
go
next
door
I.
G
Yeah,
let's
thank
Hal
again.