►
Description
Double Hashing & Content Routing - presented by @guissou at IPFS þing 2022 - Content Routing 2: Privacy - https://2022.ipfs-thing.io
The IPFS DHT currently makes use of double hashing. This talk describes a small protocol change on how double hashing should be implemented in the DHT. This change would significantly improve the reader’s privacy from the DHT nodes, with low overheads.
Slides CID: bafybeicm22quqzyvvdyxeiczbaacvdc2mliz2rovadaxnajopb6t2cwmbq
A
Gonna,
do
a
presentation
on
how
we
can
make
use
of
the
double
hashing
that
is
already
in
ipfs
for
privacy,
so
it
would
require
some
changes
to
the
ipfs
network,
but
double
hashing
is
already
here
so
yeah.
So
this
work
has
been
discussed.
I
was
in
a
privacy,
discussion
group,
and
so
what
do
I
mean
by
double
hashing?
So
what
we
have
is
in
ipfs.
A
The
location
of
this
content
or
the
provider
record
that
is
going
to
point
to
the
content
in
the
dht
will
be
located
at
the
address,
which
is
the
hash
of
the
cit,
which
means
that
the
location
of
the
provider
records
in
the
dht
will
be
kind
of
the
double
hash
of
the
content
itself
and
so
quickly,
going
through
an
example
of
how
the
the
content
lookup
works
in
ipfs
feel
free
to
stop
me
at
any
point.
A
If
you
have
any
question
or
if
I've
made
something
wrong
on
the
slide,
so
first
the
client
got
get
the
cid
from
somewhere
and
he's
gonna
hash
this
cid
to
get
so
the
bit
representation
and
gonna
look
up
in
the
routing
table
the
closest
peer
to
this
hash
and,
for
instance,
it's
gonna
be
here
pier
zero.
So
then
it's
gonna,
query,
p0
and
it's
gonna
say.
Okay,
I
want
to
find
this
specific
cid
and
then
p0,
which
is
a
dht
and
appear
in
the
dht
in
in
vht
server
mode.
A
It's
going
to
take
dcid
hash
it
to
yeah.
Take
it
sash
and
look
up
in
its
own
right
routing
table,
find
and
close
the
spear
and
return
it
to
the
original
client,
and
so
eventually
we're
gonna
get
closer
to
the
content
where
the
the
provider
record
the
source
or
we're
going
to
do
it
recursively
then,
to
the
next
peer
which
is
going
to
take
exactly
the
same
operation.
Until
eventually
we
find
a
peer
that
is
hosting
this
provider
record
and
the
peer
is
gonna.
A
A
If
we
don't
have
the
ip
address
of
this
peer
and
then
we
can
so
again
request
the
content
to
the
content
provider,
which
is
going
to
provide
the
content
all
right,
and
so
now
we
want
to
know
what's
the
privacy
model
here
who
can
know
what
kind
of
data
I
am
accessing.
So
of
course
the
content
provider
know
that
I
am
interested
in
taking
this
image
from
from
from
it,
and
there
is,
I
mean
it's
hard
to
do,
but
in
this
way
there's
nothing.
We
can
really
do
we
download
the
image
from
them.
A
Then
there
is
the
peer
hosting
the
provider
record
that
will
know
they
will
give
me
the
provider
recall.
So
they
will
know
that
I
am
interested
in
this
file.
Probably
if
I
download
the
provider
record
and
then
all
of
the
peers
that
are
gonna,
help
me
to
route
to
the
provider
record
are
going
to
know
that
I'm
interested
in
this
file
and
as
well
so
anyone
any
passive
observer
that
is
on
the
path
can
be
isp
or
someone.
A
The
same
airport
wi-fi
will
be
able
to
know
that
I
request
the
cid
and
if
they
want
to
observe
me,
they
they
can
just
take
the
cid,
do
their
request
and
eventually
download
the
file
that
I'm
interested
in,
and
so
it's
very
easy
to
track
what
people
are
looking
at.
So,
for
instance,
if
there
was
let's
say
youtube
that
was
built
on
top
of
ipfs.
A
You
wouldn't
want
the
content
to
be
encrypted,
because
you
would
want
everyone
to
be
able
to
access
it,
but
just
by
looking
at
the
cid
that
the
people
are
accessing,
you
can
see
which
kind
of
videos
they
are
looking
at
and
you
can
really
spy
on
them.
So
that's
so
that's
kind
of
the
problem,
so
we
want
client,
privacy
or
reader
privacy
in
the
dht
so
that
the
the
the
reader
yeah
connex
can
access
thing
more
privately,
and
so
we
only
focus
so
here
in
the
dht.
A
We
don't
focus
about
bit
swap
or
the
content
provider,
privacy
or
the
gateway.
We
just
focus
on
the
dht,
a
normal
client.
A
So
what
we
want
to
do
instead
is
take
a
prefix,
so
substring
of
the
hash
of
the
c80,
and
so
it
means
that
the
dht
routing
process
has
to
be
adapted.
So
what
do
I
mean
by
this?
A
But
here
it
wouldn't
change
anything
for
privacy,
but
then
we
can
build
on
top
of
it.
So
the
second
change
is:
can
you
read
the
red?
Yes,
no
yeah
yeah,
so
by
choice
of
color,
but
so
the
client
first
is
gonna.
Oh
perfect!
Thank
you.
A
So
yeah,
basically
the
the
client
is
going
to
just
take
the
prefix
of
the
hash
of
the
cid
and
so
then
compute
the
the
closest
peer
to
the
prefix
or
to
this,
the
hashtag
of
the
cid
itself
doesn't
really
matter
and
then
request
the
prefix,
okay
and
so
the
the
prefix
is.
So
we
need
to
adapt
a
little
bit
the
routing
process,
because
when
you
look
for
the
closest
peer
to
actually
a
prefix,
you
would
look
for
yeah.
A
All
of
the
peers
would,
I
didn't
see,
would
exactly
match
this
prefix
and,
if
not
just
consider
it
as
a
random
bit
and
take
what
closes
the
same
way,
that
you
would
do
the
short
distance
normally,
and
so
you
can
do
so.
It
would
actually
work
with
the
routing,
so
you
can
get
every
time
close
a
spear
and
then,
when
you
request
a
peer
that
actually
has
one
or
multiple
provider
record
that
match
this
prefix.
A
This
peer
is
not
going
to
know
which
of
these
cid.
You
want
and
it's
going
to
give
you
all
of
the
cids.
Oh
sorry,
all
of
the
provider
records.
So
here
we
have
an
overhead,
a
network
overhead,
because
we're
going
to
have
many
provider
record
that
are
transmitted
against
only
one
in
the
current
ipfs
and
then
what's
the
client
gonna
do
is
it's
gonna
discard
the
cid?
Doesn't
care
off
and
then
do
the
same
thing
as
he
used
to
do,
and
so
yeah
now
just
a
word
about
the
prefix
length
selection.
A
So
here
I
did.
I
said
that
we
have
to
compute
the
prefix,
but
I
didn't
tell
the
the
length
not
a
security
security
parameter,
but
basically
what
we
want
to
achieve
is
k
anonymity,
which
means
that
if
we
take
back
this
example,
if
I
want
that
on
average
every
time
five
provider
record
are
given
to
me,
then
the
file
I
want
to
access
will
not
be
distinguishable
again
inside
these
five
files.
So
I
get
a
k.
Anonymity
with
k
is
equal
to
five,
which
yeah
and
so
basically
to
compute
the
prefix
length.
A
L
I
have
to
take.
It
depends
on
the
the
the
canon
unity,
so
the
k
parameter,
which
is
not
to
be
confused
with
the
k
bucket
or
the
k
parameter
from
calemia
and
so
yeah.
That's
basically
the
computation,
so
the
idea
is,
if
you
take,
I
don't
know
the
node,
so
that
would
be
the
the
key
space
of
academia
and
if
you
take
the
left,
most
node,
so
zero
zero,
zero,
zero
and
you
want
to
have-
I
don't
know
four
provider
records.
A
Then
it
means
that
you
have
to
take
the
prefix
zero
zero
so
that
you
will
be
four
different
elements
and
so
yeah.
Basically
that's
the
the
the
computation.
So
we
have
to
take
the
log
of
the
total
number
of
cid
in
the
network
divided
by
the
yeah,
the
key
parameter.
A
And
so
what
do
we
gain
from
implementing
this?
We
get
cannonimity
and
plausible
deniability,
which
means
that
you
can
pretend
that
so
your
you
request,
a
prefix,
you
get
served
five
provider
records
and
you
can
pretend
that
it's
not
this
sensitive
or
illegal
file
that
you
were
downloading
but
another
one.
So
it
does
make
sense,
and
so
we
are
more
protected
or
less
vulnerable
to
the
dht
routing
table
nodes
to
the
node,
storing
the
provider
record
and
the
passive
observer.
A
But
we
don't
have
ldar
st
or
t
closeness,
which
has
which
are
two
different
metrics
associated
with
k,
anonymity
and
we
have
a
small
network
yeah.
We
have
a
network
overhead
just
for
the
provider
record
transmission
and
yeah.
It
is
very
easy
to
just
replay
the
same
prefix
request
and
so
yeah.
So
basically
anyone
could
just
so.
If
we
take
pl0
pl0
could
just
take
the
prefix
request.
A
It
get
all
of
the
provider
record
and
say:
okay,
I
know
a
bit
what
client
what
client
is
looking
for,
and
I
know
it's
been
file
number
four
and
not
the
other.
So
the
privacy
is
not
still
that
good,
but
we
reduce
the
impact
that
this
actor
may
have
and
but
we
can
go
further.
We
can
go
and
encrypt
the
provider
record.
A
A
So
it
means
that
so
the
first
part
doesn't
change,
but
the
provider
record
would
be
stored
encrypted
on
the
node,
which
means
that
if
I
want
to
pin
a
node
on
ipfs,
I
will
first
encrypt
the
node
sorry,
the
provider
record,
with
the
key
derived
from
the
cid
itself
and
and
push
it
to
the
dht,
and
so
it
even
gives
a
bit
of
a
content
provider
privacy.
Because
then
the
peers,
during
the
provider
record,
wouldn't
know.
A
A
Crypto
would
have
been
on
the
client,
but
then
so,
for
instance,
if
we
say
that
pier
p0
is
malicious,
so
pl0
will
request
the
the
prefix
and
so
we'll
get
all
of
the
encrypted
provider
record
and
peer0
only
knows
the
prefix,
which
is
a
part
of
the
hash
of
the
cid.
So,
even
if
p0
had
the
whole
hash
of
the
cid,
it
would
knew
he
would
need
to
do
a
pre-image
attack
to
be
able
to
recover
the
cid
which,
by
design,
is
not
possible.
A
So
a
p0
would
be
able
to
get
the
encrypted
cid,
but
not
be
able
to
access
it
and
see
what's
inside
yeah.
So
that's
mostly
it's.
The
the
observer
can
only
decrypt
the
provider
record
if
they
have
the
cid,
but
it
means
that
it's
not
a
perfect
security,
because
if
you
want
to
access,
I
I
don't
know.
If
you
have
the
picture
for
me
holidays
and
that
only
you
access
it,
then
it
is
fine.
Only
no
nude
know
the
cid
and
nobody
will
be
able
to
see
your
the
picture
of
your
holidays.
A
But
if
there
is
again
so
if
we
say
that
there
is
a
decentralized
youtube
on
ipfs,
someone
could
go
through
all
of
the
videos
get
all
of
the
cds
of
the
video
compute,
the
hash
of
them
and
have
a
big
dictionary,
and
so,
when
so
yeah
so
an
observer
when
they
see
the
the
prefix
that
is
requested,
they
can
go
and
check
up
in
the
dictionary
if
they
would
match
a
video
and
they
can
still
yeah
know
what
you're
looking
at.
So
it's
improved
privacy
a
lot.
But
it's
not
the
the
perfect
solution.
A
It
is
still
possible
to
to
make
some
attacks
with
yeah
a
lot
of
resources,
and
so
the
downside
is
that
we
have
a
one,
symmetrical
decryption
operation
and
so
yeah.
We
still
we
reduce
again
the
the
impact
or
the
the
power
that
this
adversary
can
have,
and
so
that's
kind
of
the
final
picture
yeah.
So.
B
B
Yes,
because
we've
gone
from
a
small
number
of
providers
to
one
distinct
provider
per
sit,
but
it
does
seem
like
potentially
there's
a
way
to
de-link
that
or
if
we're
not
giving,
but
you
could
potentially
have
two
layers,
one
that
is
a
provider
record.
That's
just
I
am
a
provider
and
you
can
have
the
fee
of
which
provider
that
is
so.
Do
it
in
two
stages,
there's
potentially
a
way
to.
B
A
A
So
the
the
the
conclusion
is
that
we
can
yeah
really
significantly
improve
the
the
reader
privacy
in
the
dht,
and
only
this
and
the
the
yes
or
the
dhc
server
wouldn't
need
to
hash
the
cid
for
each
request,
which
is
good,
because
when
you
look
up
you,
you
have
a
lot
of
node
helping
you
to
to
be
routing.
A
It's
that
much
of
hash
operation
less
and
the
overhead
to
pay
is
just
sending
k
provider
record
with
k
being
the
the
key
anonymity
parameter
instead
of
one,
but
only
one
spare
request
and
the
computation
of
our
head
would
be
one
symmetrical
decryption
of
the
provider
record,
which
shouldn't
be
too
heavy
even
for
mobile
applications,
but
yeah.
It
would
require
to
modify
the
server
code
and
go
through
a
migration,
so
republishing
all
of
the
cids,
with
all
of
the
provider
record
now
encrypted
to
be
able
to
find
them
again
and
yeah.
A
It's
then
I
mean
it
is
also
the
illusion
of
privacy,
so
people
may
feel
safe,
whereas
they
shouldn't,
because
it's
still
possible
to
attack
if
you
have
a
lot
of
resources
or
if
you
are
not
in
this
very
specific
reader
privacy,
dht
lookup
case
so
yeah.
Now,
I'm
happy
to
take
any
question.
If
you
have.
B
Yeah,
so
the
pre-fixing
part
kind
of
reminds
me
of
like,
like
a
more
general
like
an
instance
of
morgana
thing,
that
would
be
like
a
locality
sensitive
patch.
Can
you
mention
that,
like
the
question
is
basically
like,
did
you
consider
using
other
methods
and
prefixing
as
a
way
of
like
hiding
some
of
the
information
about
what
you're
looking
for.
A
Yeah,
so
the
the
thing
is
that
it's
hard
to
do
privacy
and
content
routing,
because
if
you
make
each
file
distinct,
then
you're
not
able
to
route
it
anymore.
And
so,
if
you
do
like
not
the
prefix
but
a
suffix
method
or
something
else,
you
lose
the
routing
pro
yeah
proper
yeah.
You
lose
the
routing
component,
so
yeah.
What
what
we
have
here
is
that
if
you
have
the
prefix,
you
know
it's
still
going
to
be
in
here,
so
you
can
route
until
this
node.