►
From YouTube: Private Set Intersection for IPFS - Erik Daniel
Description
This talk was given at IPFS Camp 2022 in Lisbon, Portugal.
A
Hi,
my
name
is
Eric
Daniel
and
I'm.
A
PhD
student
at
the
table,
Berlin
I'm,
going
to
talk
about
privacy
forbid,
swap
requests
bit
swap
is
the
default
data
exchange
protocol
in
ipfs,
and,
if
you
think
about
bit
swap
you
might
know
how
it
works,
but
now
you
might
think,
are
there
any
privacy
leakages?
A
Let's
just
take
a
look
now
here
you
can
see
the
bits
for
process
we
have
at
the
beginning,
an
initial
Discovery
phase
answer
file
transfer
and
if
you
can't
find
the
specific
Cid
in
our
neighborhood,
we
have
the
DHT
requests.
The
problem
here
is
all
those
somehow
leak
information.
On
the
one
hand,
what
one
person
is
interested
in,
on
the
other
hand,
what
another
person
stores
here
for
our
presentation,
we
use
a
client
as
the
one
that
is
requesting
information
and
server
for
all
other
peers.
A
Now
what
we
want
is
to
obfuscate
the
first
part,
the
one-half
part
we
want
to
obfuscate
the
item
of
Interest,
and
this
should
be
done
with
a
low
latency
approach.
It
should
induce
a
low
computational
cost
on
the
server
side
and
ideally
has
a
low
overhead
on
the
client
side.
A
Now,
if
you
think
about
it,
what
is
the
simplest
approach
that
you
can
do
to
gain
the
information?
What
a
server
is
storing
without
revealing
your
item
of
Interest?
Well,
it
is
to
simply
asking
for
everything
he's
storing.
This
can
potentially
have
a
large
overhead
still,
if
you
improve
this
idea,
which
we
do
it
should
be
still
manageable.
A
A
You
can
see
a
rough
outline,
how
they
work,
I'm,
referring
to
the
papers
and
the
footnotes
if
you're
interested
or
need
to
know
or
need
to
refresh
your
knowledge
again
now
how
about
probabilistic
bits,
Bob,
look
like
well,
you
have
the
one
half
request
and
instead
of
requesting,
instead
of
requesting
your
item
of
interest,
you
request
the
probabilistic
data
structure
receive
it,
look
up
your
item
in
the
probabilistic
data
structure
and
then,
if
it's
a
match,
you
request
the
block
from
the
server.
Then
you
receive
the
block
and
you're
happy.
A
Of
course,
there
is
a
slight
problem:
it's
probabilistic
pet
spot,
these
probability
data
structures
where
they
have
no
false
negatives.
They
can
have
false
positives.
These
false
positives
have
the
advantage
for
the
server
that
they
provide
plausible
deniability.
A
However,
this
is
the
downside
of
this
approach.
If
it
is
a
false
positive,
the
I
am
term
of
interest
is
revealed
without
getting
the
block,
but
here
you
can
also
see
follow-up
requests.
I,
don't
need
this
probabilistic
data
structure.
I
need
to
request
this
probabilistic
data
structure
again,
assuming,
of
course,
the
server
set
hasn't
changed
in
between.
A
Now
the
next
approach
uses
private
set
intersection.
Private
set
intersection
is
used
to
privately
compute
intersection
of
two
sets,
and
those
two
sets
are
in
our
case.
On
the
one
hand,
the
client,
inter
once
so,
what
the
client
is
interested
in
and,
on
the
other
hand,
the
stored
cids
by
the
server
and
this
private
set
intersection
can
leak
the
set
sizes
of
both
parties.
So
the
number
of
cids
the
client
is
interested
in,
and
the
number
of
blocks
the
server
stores.
A
The
next
approach,
which
I'm
showing
uses
elliptic
curve,
diffie-hellman
PSI.
So
if
you
know
a
bit
about
diffie-hellman,
you
might
see
something
which
looks
awfully
familiar.
A
Yeah
so
ecdh
PSI
bit
swap
the
client
requests
again.
As
for
the
probabilistic
bit,
swap
requests
you,
you
is
here's
a
storage
information
of
the
server.
However,
he
also
requests
we.
We
is
somehow
transformed
a
multi-hash
of
the
client's
interest.
So
it's
not
the
pure
CID,
but
it's
magically
transformed
the
magic
here
is
basically
hashing
the
ca,
the
multi-hash
to
elliptic
curve
and
then
multiplying
it
with
a
random
number
of
the
client's,
choosing
the
server
answers
with
all
historic
cids.
A
In
this
case
they
are
also
transformed,
of
course,
and
he
also
transforms
the
received
V
by
also
multiplying
it
with
his
own
random
number
and
then
sends
them
back.
The
client
then,
can
then
do
with
the
whole
PSI
matching
again
applying
some
math
to
the
received
value,
seeing
if
it's
in
the
set
and
then
if
it
is
in
the
set
okay,
he
knows
I
can
request
that
block
receive
as
a
block
and
the
whole
exchange
is
done.
A
A
Now
we
can
already
think
about
one
Improvement
if
you
think
about
the
probabilistic
bit
Swap
and
by
making
this
a
probabilistic
ecdh
PSI
bit
swap-
and
this
is
just
changing
you
to
to
a
probability
data
structure
if
we
take
Bloom
filter.
What
we
would
do
here
is
not
use
a
multi-hash
to
determine
the
bit
position
and
instead
we
would
use
the
transformed.
So
we
would
use
the
elliptic
curve
point
so
has
to
curve,
and
then
multi-hash
hash
to
curve
getting
a
point.
A
A
We
can
integrate
this
in
bit,
swap
within
reason
or
overhead.
We
have
a
one-time
overhead
of
the
calculation
of
U
and
the
server
side
and
transferring
of
course,
the
advantage
of
this
approach
is.
The
server
has
to
calculate
this
only
once
for
all
his
neighbors
still.
This
is
a
one-time
overhead
on
the
server
side.
A
If
we
only
look
at
transferred,
byte
and
excluses
one
time
overhead,
we
have
almost
no
additional
transfer.
We
have
one
half
messages
which
we,
which
the
client
sends
to
the
server
and
we
receive
a
don't
have
so
no
additional
messages
and
there's
a
data
that
is
sent
is
basically
an
elliptic
curve
point.
If
we
use
ristrative
255
a
curve,
then
the
compressed
Point
size
is
32
bytes.
A
What
are
the
next
steps,
of
course,
an
intensive
evaluation
of
the
approach.
So
while
we
know
okay,
the
transfer
bytes
aren't
that
huge
or
almost
non-existent
it
still
could
use,
could
induce
some
computational
overheads.
I
can
already
spoil
a
bit
if
you
think
about
stereo255
curve.
Again
we
have
an
encryption
time
of
around
a
millisecond,
410,
cids
or
n10
multi-ashes,
to
be
more
precise,
still
there's
some
additional
accounting
to
be
done
and
management
of
the
messages
which
to
send
and
so
on.
So
this
needs
to
be
further
evaluated.
A
Another
problem
are
large,
sets
more
than
ten
thousands
or
maybe
even
more
than
thousands.
The
whole
range
is
not
yet
known
by
us.
One
simple
extension,
which
could
be
done
is
reducing
the
size
of
you.
This
is
by
just
make
leaking
a
few
bits
of
information.
We
could
reduce
the
potentially
large
by
simply
saying
okay.
We
only
want
you
to
include
in
this
Bloomfield
cids
using
sha-256
for
the
multi-hash
and
maybe
even
leaking
a
few
bytes
of
the
Digest
and
those
conditions
aren't
fulfilled.
A
You
don't
have
to
include
them
in
the
boom
filter,
and
this
would
reduce
you,
which
is
the
biggest
overhead.
We
assume
here,
of
course,
that
it's
unlikely
that
the
client
requests
thousands
of
cids
at
one
at
once
in
the
future.
We
also
want
to
think
about
how
we
can
extend
this
to
include
the
transfer
of
the
data
associated
with
the
element,
so
in
this
case
not
only
finding
out
if
the
server
stores
it
but
also
privately
retrieves
the
block.
There
are
some
PSI
approaches
which
allow
this
transfer.
A
Another
part
we
want
to
take
a
look
at
is
if
this
can
be
extended
to
also
include
DHT
requests
or
in
general,
if
PSI
is
suitable
for
the
HD
requests
before
we
come
to
the
end
of
this
talk,
let's
let
me
give
you
a
small
demo,
or
example
for
Bloom
filter
requests.
Here
we
have
just
a
small
table
you
can
see.
This
is
from
the
client's
perspective.
Direction
means
out
here
means
sending
data
in
means.
A
Yeah-
and
here
we
can
see
okay
internally,
the
client
wants
a
specific
CID,
what
he
sends
to
the
servers,
only
a
small
dummy
CID,
so
the
server
knows.
Okay,
this
client
wants
a
wants,
the
bloom
filter,
but
he
receives
as
the
bloom
filter
and
after
lookup.
In
this
case
the
server
store
it.
You
can,
okay,
send
the
the
CID
he
is
interested
in.
A
So
we
have
here
this
in
internal
have
so
the
lookup
was
successful,
and
then
we
can
send
one
block
block
and
after
receiving
the
block,
we
can
send
the
one
block
cancel.
A
Now,
for
PSI,
this
looks
a
little
bit
different.
In
this
example,
we
already
exchanged
the
bloom
filter.
We
received
it
earlier.
However,
what
we
can
see
here
is
internally.
This
is
the
cad
the
client
is
interested
in
and
what
he
sends
to
the
to
the
server
is
the
CID
they're
completely
indistinguishable
from
each
other,
and
the
only
thing
that
changes
here
is
the
digest
and
in
this
case
also
some
multi-hash
code.
A
If
you
wait
a
little
bit,
I
show
you
just
a
little
bit
more
in
detail,
and
the
important
part
here
is
also
the
don't
have
answer
is
also
different
from
the
scent
one
half
internally
again
after
executing
the
PSI
protocol.
You
can
then
determine
okay,
this
is
half
or
is
this
it
don't
have
it's
a
half.
Okay,
one
block
block.
One
block
cancel
now
here
three
cids
in
a
bit
more
details.
A
You
can
see
here
the
version
multicodec
multi
multi-hash
code,
multi
hash
length
and
the
Digest,
and
if
you
compare
the
internal
want
and
the
send
one
you
can
see
here,
the
digest
changed
and
the
multi-hash
code
changed.
The
multi-ash
code
is
used
as
simply
for
matching
purposes.
So
if
you
look
further
the
receive,
don't
have
again,
we
have
another
digest,
but
we
have
the
same
multi-hash
code
so
to
match
this
receive,
don't
have
to
the
want.
A
A
Yes
with
that,
thank
you
very
much
for
attention
and
have
a
nice
day.