►
Description
This talk was given at IPFS Camp 2022 in Lisbon, Portugal.
A
So
inside
the
DHT
we
want,
the
DHD
server
appears
to
help
us
route
to
the
content,
we're
looking
for.
So
that's
the
purpose
of
content.
Writing.
So
we
want
them
to
know
what
we
are
looking
for,
because
they
can
help
us,
but
at
the
same
time
as
we
want
privacy,
we
don't
want
them
to
know
what
we're
looking
for
right,
and
so
that's
a
bit
of
the
challenge.
How
do
we
make
sure
that
they
can
actually
help
us
to
get
to
the
content
we
want
without
revealing
them?
A
So
if
you
have
an
image
you
just
make
it
through
a
hash
function,
you're
going
to
get
a
binary
hash
and
then
from
this
multi-hash
you
will
add
some
prefix
and
you
will
get
a
CID
from
there,
which
is
the
string
you
can
see
and
then
a
concept
that
is
important
for
the
double
hashing.
So
why
double
hashing?
It's
because
we
hash
the
content
twice,
so
we're
gonna
hash,
this
multi
hash
once
more
to
get
another
identifier
and
so
in
the
slide
so
yeah
in
practice.
A
It's
you
don't
directly
hash
the
CID
itself,
but
it's
just
easier
to
write
it
in
a
slide
and
to
yeah
write
the
the
long
version
and
also
one
another
important
component:
is
the
provider
records
so
when
you
get
to
publish
or
to
provide
some
content
in
the
in
ipfs?
What
you're
gonna
do
is
you
do
not
drag
and
drop
your
content
and
store
it
on
ipfs?
A
The
content
will
stay
on
your
machine
and
you
will
just
make
a
pointer
and
write
it
in
the
DHT,
and
so
anyone
will
be
able
to
find
this
pointer
and
discover
that
you
store
the
file,
so
the
provider
recourse
are
disappointed
that
map
the
CID,
so
the
contact
identifier
to
the
peers
that
are
hosting
them.
So
in
the
case,
it's
also
possible
that
multiple
peers
store
the
the
same
content.
A
And
so,
if
you
look
up
for
a
Cid
in
the
DHT,
you
will
get
a
list
of
the
content
provider
and
you
can
then
ask
them
to
retrieve
the
content.
So
now,
how
does
pinning
content
to
the
DHT
work
so
I
don't
say,
provide
or
because
we're
not
putting
the
actual
content
in
the
DHT?
We
only
put
the
provider
record.
A
A
So
there's
already
a
second
hash
which
is
going
to
give
us
a
binary
string
and
we're
gonna
look
in
the
DHT
for
the
closest
location
in
the
the
binary
key
space,
so
the
academy
so
yeah,
usually
the
HDs
are
represented
as
a
circle
as
we've
seen
in
the
in
the
last
presentation,
but
in
the
in
the
case
of
Galilea,
it's
better
represented
as
a
binary
tree
because
we're
talking
about
sore
distances-
and
so
here
we
we're
gonna
see
so
the
the
request
is
going
to
be
iterative.
A
So
first
I'm
going
to
ask
the
closest,
pier
or
just
appear
I
know.
Okay,
can
you
get
me
closer
to
the
place
I'm
looking
for
and
eventually
I'm
gonna
find
the
closest
peer
IDs
to
the
content
or
to
them
to
the
hash,
where
I
want
to
store
my
provider
record
and
then
once
the
the
request
has
converged,
I'll
simply
ask
these
peers.
A
Okay,
can
you
please
store
this
pointer
for
me
so
that
anyone
that
is
looking
for
this
content
can
then
in
turn,
ask
me
for
the
content,
so
these
peers
are
going
to
start
a
provider
record
and
then
any
clients
that
knows
the
CID
can
simply
hash
it
and
look
up
for
where
the
the
content
is
stored
in
the
DHT,
and
you
only
need
to
find
one
of
the
closest
peers
that
is
storing
a
provider
record
and
then
this
peer
is
gonna.
Give
you
the
provider
record.
A
So
it's
gonna
tell
you
where
the
content
is
actually
located.
So
now
you
have
the
peer
ID
of
the
content
provider.
You
can
directly
request
it
over
bit,
swap
so
that's
how
and
content
routing
currently
work,
and
so
what?
What
does
the
DST
learn
for
each
request
so
for
each
request,
the
DHC
learns
the
rddt.
If
we
take
a
DHT
server
peer.
That
is
gonna
help
you
to
route
your
request.
It
will
learn,
of
course,
your
peer
ID,
because
you
open
a
connection
with
them,
so
they
will
know
you.
A
They
also
know
your
IP
address
and,
as
you
request,
then
some
content,
because
you
want
them
to
help
you.
They
will
be
able
to
associate
your
content
and
with
this
peer
ID,
and
so
it
means
that
they
can
track
Which
CID
are
accessing,
but
if
they
want
to
know
really
what
you're
accessing
they
can
just
take
this
CID
and
in
turn
also
request
it
to
the
DHT
and
get
the
same
file
ICU.
A
So
anyone
that
wants
to
spy
on
you
can
just
listen
to
to
your
request,
help
you
route
them
and
then
resolve
the
same
content
as
you
and
you're
absolutely
tracked.
So,
for
instance,
if
the
orange
node
here
is
malicious
and
the
client
asks
them
for
the
CID,
then
this
node
now
knows
the
CID
right,
and
so
it
can,
in
turn,
request
for
the
provider
record.
Learn
where
the
content
is
stored
and
retrieve
the
content
which
is
undesirable
and
that's
what
we're
trying
to
address
with
this
upgrade.
A
So
why
double
hashing?
How
did
we
get
there
so
in
ipfs?
The
way
it
works
now
is.
The
content
is
addressed
by
the
content
identifier,
but
it
would
be
great
to
have
another
identifier
that
would
be
specific
for
the
DHT
so
that
you
can
look
up
for
some
content
in
the
DHT,
and
so
the
peers
could
learn
about
this
file
identifier,
but
then
they
cannot
use
it
to
request
the
content,
because
when,
for
instance,
you
request
the
content
over
bit,
swap
it's
going
to
be
a
different
identifier
right.
So
you
want.
A
We
want
to
have
different
content
identifier
according
to
them
different
content,
routing
mechanism
or
data
transfer,
so
that
we
cannot
link
them
together
and
also
what
we
want
is
that
this
new
identifier
shouldn't
be
hard
to
derive.
So
once
we
have
the
CID,
we
want
to
be
able
to
efficiently
found
the
identifier
in
the
DHT.
A
So
that's
one
first
Improvement,
so
that
if
I
request
a
key
in
the
DST,
the
node,
there
can
learn
it
but
cannot
get
anything
from
it
now
in
order
to
gain
K,
anonymity
and
plausible
deniability.
What
we
can
do
is
instead
of
requesting
exactly
for
the
the
yeah.
So
sorry,
so
this
new
identifier
can
be
the
hash
of
the
CID
or
you
can
hash
the
CID
along
with
a
constant,
and
so
that
will
be
the
DHT
identifier.
A
So
the
the
second
hash-
and
so
in
this
case,
in
order
to
gain
anonymity
and
plausible
deniability,
because
that's
unusually
desired
component
of
privacy
and
what
we
can
do
is
request.
The
hash
of
sorry
request
the
prefix
of
this
second
hash,
which
means
that
in
this
case
we
can
request
a
prefix
and
we
will
get
approximately
in
the
in
the
right
region
in
the
tree.
So
the
node
can
still
help
us
route
to
to
the
right
place.
But
as
we
request
the
hash,
they
will
add
the
prefix.
A
They
will
be
probably
many
a
provider
records
that
match
this
prefix,
and
so
we
will
get
multiple
provider
records
so
that
the
DHT
server
node.
That
are
serving
us,
this
provider
record
don't
know
exactly
which
content
we're
looking
for
so
I,
say:
okay,
I'm,
looking
for
approximately
this
content
again,
I
get
many
provider
record
and
I
can
only
select
the
one
I'm
interested
in
and
discard
the
rest.
A
However,
this
gives
us
can
anonymity,
but
no
elderiversity
or
t-closeness,
which
means
that
if,
let's
say
in
a
specific
branch
of
the
tree
for
a
specific
prefix,
there
is
one
file
that
is
very
popular
and
everyone
wants
to
access
it
and,
let's
say
10
other
files
that
nobody
wants
to
access
it.
Then,
if
somebody
makes
a
request
in
this
specific
branch
of
the
tree,
then
it's
very
likely
that
they
access
the
popular
content.
A
So
it's
not
perfect,
but
at
least
it
gives
us
a
plausible
deniability,
and
so
yet
we
can
still
do
correlations
attack
on
the
prefixes.
It's
all
the
branches,
one
other
component
that
we
can
use
to
improve
our
system
further.
A
Is
we
can
encrypt
the
provider
record
so
now,
when
so
the
provider
record
our
disappointers,
that
I
put
to
the
DHT
to
indicate
that
I
store
the
content,
and
my
peer
ID
is
in
the
clear
which
means
that
anyone
that
hears
about
the
request,
even
if
it's
a
different
hash,
can
know
that
I
am
storing
the
content.
They
won't
be
able
to
request
me
the
content,
because
they
only
know
the
dhti
identifier,
so
I
will
know
that
they
don't
have
the
CID
and
I
will
not
give
them.
A
A
And
so
it
will
guarantee
so
anyone
can
still
request
these
new
DHT
identifier
and
get
the
provider
record.
But
now
only
the
peers
that
know
the
CID
so
like
the
CID
would
be
the
secret
to
be
able
to
decrypt
this
provider
record
only
the
peers,
knowing
the
CID,
will
be
able
to
decrypt
it
and
know
that
I
am
storing
the
file.
A
And
now
so,
this
has
some.
The
encryption
has
some
undesirable
effects,
which
means
that
now,
if
I
know
the
content
to
of
some
file,
I
can
say
that
somebody
else
is
providing
it
and
I
can
create
a
provider
record
pointing
to
somebody
else
to
Dos
them
and
the
so.
The
DHT
server
cannot
verify
anymore
if
the
peer
ID
in
the
provider
record
matches
my
peer
ID
because
I'm
uploading
it
so
what
we
can
do
is
we
can
simply
sign
the
the
encrypted
provider
record
so
that
the
DHT
server
can
verify.
A
If
to
the
signature
of
the
encrypted
provider,
record
that
the
server
cannot
read,
matches
my
connection
with
them,
foreign
and
so
in
turn,
it's
also
good
because
the
client,
so
when
a
client
then
gets
the
encrypted
provider
record
along
with
the
signature,
they
can
decrypt
the
provider
record
to
get
the
decrypted
peer
ID
and
they
can
verify
the
signature
against
this
peer
ID.
A
A
A
A
So
now,
how
will
the
system
work
so
I
will
still
look
up
so
for
the
second
hash
of
my
content
in
the
DRC
and
I
will
store.
So
it
will
not
be
exactly
the
same
location
because
the
salt
will
be
different,
but
the
process
will
be
exactly
the
same.
So
I
look
for
the
closest,
pier
and
now
I
have
my
encrypted
peer
ID,
so
I
I
encrypt
my
peer
ID
with
the
the
CID
I
sign.
A
It
and
I
send
it
to
the
to
those
peers
in
the
DHT
and
then
the
clients
that
want
to
look
up.
Some
content
is
going
to
look
up
the
hash
of
the
CID
and
even
a
prefix
of
it,
so
that
the
nodes
don't
learn
a
lot
about
the
content
that
I
access
and
then
what
I'm
gonna
get
is
I'm
gonna
get
a
bunch
of
Provider
record.
So
that's
a
system
parameter
that
we
can
adjust.
So
in
the
knitty,
we
can
choose
the
K,
of
course.
A
So
here
I'm
gonna
get
like
four
different
encrypted
provider
record
and
I'm
gonna
discard
all
the
one
I'm
not
interested
in,
because
I'm
interested
only
in
a
single
one,
I'll
be
able
to
decrypt
it
because
I
know
the
CID
and
once
I
decrypt
it
I
can
verify
the
signature
because
it
was
signed
with
the
the
private
key
of
the
peer
ID.
That
is
storing
the
content
right
and
so
then
I
have
the
guarantee
that
the
peer
ID
that
I
have
at
least
knows
the
CID.
A
A
So
if
we
say
that
the
dxt
is
an
entity,
they
cannot
determine
exactly
which
content
I
am
accessing
now
so
before
the
so,
my
peer
ID
could
be
associated
with
directly
with
the
CID
I
was
looking
for
and
now
we
can
only
be
associated
with
the
hash
of
the
CID,
and
but
so
it
means
that
if
the
adversary
already
knows
the
content,
I
am
looking
for.
A
So,
if
we
take
an
example,
there's
a
decentralized
kind
of
YouTube
platform
like
PFS,
and
we
have
a
global
adversary
that
is
going
to
crawl
it
and
index
all
of
the
videos
they
can
in
turn,
compute
the
hashes
of
all
of
the
CID,
and
so,
when
I
make
a
request
with
this
identifier,
they
already
have
it,
so
they
can
still
track
my
request.
But
in
the
case
where
I,
upload
or
I
advertise
my
holiday
pictures
on
ipfs
and
I
want
to
share
it
with
my
friends.
A
Nobody
knows
the
cids
except
the
with
the
person
that
I
shared
the
link
with,
and
so
nobody
can
actually
know
the
data
that
is
being
accessed
also,
so
we
get
some
writer
privacy
Improvement,
because
the
provider
record
is
encrypted.
So
it's
not
in
the
clear
we
don't
know
anymore,
who
is
storing
specific
content.
A
No
additional
packet
when
doing
the
lookup
is
just
when
we
retrieve
the
provider
record,
we
will
retrieve
K,
which
so
K
anonymity
instead
of
one.
So
that's
like
the
so
the
number
of
bytes
on
the
network
will
increase
on
the
storage
side.
A
So
now
the
changes
that
it
would
imply
for,
like
the
ipfs
environment,
so
only
p2b
has
been
upgraded,
so
I
mean
when
I
say
only
it's
like
goalie
P2P,
roughly
P2P,
namely
P2P
jstly
P2P.
So
all
DHT
implementation
have
to
be
upgraded,
but
then
the
application
building
on
top
of
lip,
P2P
or
ipfs
would
automatically
benefit
from
this.
As
an
application
Builder,
you
don't
need
to
change
anything.
A
These
changed
integrates
in
ipfs
reframe,
so
there's
a
pull
request
where
we
specify
the
new
private
content
request
and
the
double
edging
approach
will
also
be
implemented
by
the
indexer
so
yeah.
This
approach
would
be
for
the
DHT
and
the
indexer
with
the
same
interface
and
so
Elizabeth
from
chainsafe
that
I
couldn't
make
it
today
is
currently
working
on
the
implementation.