►
Description
This talk was given at IPFS Camp 2022 in Lisbon, Portugal.
A
The
talks
have
been
amazing
and
there's
some
crossover,
which
is
really
cool,
especially
with
like
the
Data
Trust
stuff
that
Kelsey
spoke
about
so
I'm
Alex
I'm,
the
co-founder
of
phase
three,
a
web,
three
innovation
consultancy
for
the
first
half
of
the
year,
I
worked
on
a
project
called
links.
We
looked
at
enabling
privacy,
preserving
storage,
sharing
and
Analysis
of
sensitive
data,
with
a
focus
on
biometric
data,
from
wearable
devices
to
amplify
user
into
insights
and
catalyze
Science
and
innovation.
A
A
So
I
want
to
start
off
with
a
story
of
something
that
recently
happened
to
a
close
friend.
He
was
freshly
graduated
master
of
engineering
and
the
ink
was
barely
dry
on
his
degree
when
he
began
pulling
together
all
the
work
for
his
portfolio
and
he
finds
out
two
months
later.
The
university
has
deleted
it
all
off
of
their
servers,
locked
him
out
of
an
account
everything
his
six
years
of
work,
which
he
trusted
the
institution
to
store,
is
vital
for
both
building
his
portfolio
and
actually
patenting
his
final
year
project.
A
This
has
also
been
deleted
too
expensive
for
them
to
even
store
for
another
day
post
graduation
now
he's
spending
days,
emailing
past
tutors
coursemates,
trying
to
find
things
that
have
been
squirred
away
on
email
servers,
download,
folders
and
personal
hard
drives.
This
story
is
not
unique.
He,
along
with
thousands
of
other
students
and
researchers,
trust
the
storage
provided
by
universities
to
keep
their
knowledge
safe.
A
A
So,
let's
first
dive
into
what
is
meant
by
privacy,
preserving
compute
also
referred
to
as
privacy,
preserving
machine
learning
or
AI
privacy.
Preserving
computation
requires
the
derivation
of
insights
from
data
without
ever
having
to
see
or
share
that
data.
This
is
enabled
by
cryptography,
distributed
computation
and
verify
verifiable
privacy
and
governance.
A
A
For
this
example,
we
only
need
to
know
that
does
Wally
exist
and
where
are
his
exact
coordinates.
This
is
a
crude
example
of
a
data
to
commute
compute
model.
This
is
where
we,
as
a
data
analysts,
are
looking
for
Wally
by
looking
at
all
of
the
data
in
order
to
determine
these
two
points
with
privacy
preserving
compute.
Instead,
we
use
methods
that
obfuscate
the
data
that
is
not
necessary
to
determine
the
answers
we
need.
A
Privacy,
preserving
compute,
is
a
vital
building
block
for
DSI
for
contacts,
half
of
or
FDA
approved,
medical
devices
based
on
AI
or
for
radiology,
and
these
are
trained
on
no
more
than
a
thousand
images
for
comparison.
Dali
is
trained
on
hundreds
of
millions
of
images,
and
most
doctors
see
between
seven
to
eight
thousand
patients
a
year.
A
This
causes
a
problem
as
these
AI
models
only
are
only
being
trained
on
a
narrow
subset
of
the
population,
leading
to
a
misrepresentation
of
those
who
are
excluded
from
the
data
set
privacy.
Preserving
computation
is
already
being
done
without
a
blockchain,
with
some
of
the
most
prominent
work
being
pioneered
by
organizations
like
openmind.
A
So
with
distributed
computation,
we
have
the
ability
to
deploy
machine
learning
models
to
get
meaningful
Knowledge
from
geographically
distributed
large-scale
data.
We
see
that
with
baklav
several
well,
several
architectures
exist
to
be
able
to
do
this.
Privacy
and
security
have
not
been
sufficiently
addressed
and
existing
models
are
vulnerable
in
their
architecture
and
have
efficiency
limitations.
A
A
So
I
want
to
talk
about
what
is
privacy,
because
there's
like
a
some
definitions
of
this,
so
I
want
to
look
at
this
in
the
context
of
hellenism's,
contextual,
Integrity
theory
of
privacy,
which
defines
privacy
as
appropriate
information,
appropriate
flows
of
information
where
the
appropriateness
is
defined
by
the
context
and
it's
contextual
in
informational
norms.
A
A
This
definition
of
privacy
is
much
more
nuanced
and
captures
all
kinds
of
edge
cases.
For
example,
is
your
genomics
data
strictly
yours
when
you
get
your
Gene
sequenced,
that
data
also
belongs
to
your
ancestors
and
to
any
children
you
have
or
may
have.
This
is
just
one
example
of
why
the
issue
of
privacy,
preserving
computation,
is
not
just
a
technical
question,
but
also
ethical.
A
Federated
learning
is
an
interesting
model,
as
this
allows
for
models
to
be
trained
collaboratively
by
distributed
nodes.
In
theory,
this
model
is
privacy.
Preserving.
However,
there
is
evidence
that
Federated
learning
language
models
can
be
reverse
engineered
to
reveal
private
information.
Therefore,
more
research
needs
to
be
done
to
truly
establish
privacy,
preserving
computational
methods.
A
So
through
our
research
with
links,
we
identified
four
ways
to
theoretically
manage
this
data
in
this
in
a
decentralized
way.
The
first
is
using
web
2
tools.
There
are
various
architectures
to
approach
this
problem.
This
way,
however,
the
system
cannot
be
truly
decentralized,
as
there's
always
going
to
have
to
be
a
trusted
intermediary
between
different
parties
to
maintain
the
system.
A
The
second,
we
looked
at
a
completely
open,
decentralized
approach.
This
involved
encrypting
sensitive
data
with
cryptographic,
Solutions
such
as
homomorphic
encryption
and
storing
that
data
on
the
ipfs
public
network.
Risk
of
this
include
when
the
encryption
standard
is
broken.
This
sensitive
data
is
no
longer
private.
A
There
is
an
argument
that
you
could
locate
the
data
by
the
CID,
the
content,
addressable
IDs
and
therefore
re-encrypt
the
data
in
with
the
new
with
a
new
algorithm.
However,
there's
a
risk
if
the
node
that
storing
this
encrypted
data
with
the
old
encryption
mechanism
goes
offline,
then
you'll
never
be
able
to
with
a
hundred
percent
accuracy
re-encrypt
that
data.
So,
therefore,
this
is
not
a
viable
option,
because
there's
no
guarantee
of
the
right
to
be
forgotten.
A
A
So,
third,
we
looked
at
a
hybrid
approach.
This
involved
storing
sensitive
data
on
centralized
databases
with
pointers
to
the
on
to
these
databases
stored
on
the
blockchain.
This
could
be
governed
by
intermediaries
or
trusted
data
unions
or
data
trusts.
Although
this
method
still
relies
on
some
centralization,
with
the
risk
of
the
data
Union
becoming
so
big
that
it
controls
the
entire
network,
there
could
be
rules
put
in
place
to
avoid
for
this
from
happening.
A
Last
we
toured
with
the
idea
of
a
private
Network
approach.
We
use
ipfs's
private
Network
as
inspiration
for
this.
What
this
could
look
like
for
DSI
is
multiple
institutions
such
as
universities
and
research,
centers
deciding
to
create
a
private
distributed
Network
between
them
to
enable
the
sharing
of
sensitive
data
without
the
risk
of
the
nose
going
offline,
as
this
would
be
part
of
the
legal
agreement
between
them.
A
So
what
next?
From
our
research,
the
biggest
issues
lie
in
the
gray
area
of
data
ethics.
This
is
The
Balancing
Act
between
doing
and
being
paralyzed
by
the
what-ifs.
Successful
teams
will
be
interdisciplinary
and
research
still
needs
to
be
conducted,
which
develops
appropriate
ethical
standards
of
how
the
models
can
be
trained
but
and
how
data
is
handled.
This
could
look
like
Global
red
lines,
however,
with
the
world
looking
how
it
does
now.
This
may
not
be
the
most
appropriate
approach.
A
There's
a
this
is
an
area
where
data
unions
could
add
tremendous
value
rather
than
compile
rather
than
complying
with
your
sort
of
geographical
location
of
data
privacy.
You
could
join
different
data
unions,
which
fit
closely
with
your
values.
For
example.
Maybe
you
want
to
share
your
data
with
all
research
projects,
so
you
would
join
a
debt
Union
that
does
that
or
maybe
you
only
want
to
share
your
data
with
like
a
research
projects
that
are
focused
on
X,
but
you
don't
want
to
do
it
for
like
something
else.
You
could
join
a
data
Union.
A
That
does
that.
So.
To
conclude,
this
is
an
incredibly
nascent
area
and
if
you
want
to
discuss
this
further,
please
get
in
touch.
We
have
an
Ethics
telegram,
Channel
focused
on
human
data
in
web
3,
which
is
open
for
anyone
to
join.
You
can
message
me
on
telegram,
which
is
the
same:
the
top
handle
and
I
can
add
you,
and
there
are
lots
of
projects
working
on
this
problem,
which
I
can
direct
you
towards
and
yeah.