►
From YouTube: Encryption in Swarm
Description
In this presentation from Day 3 of the #SwarmOrangeSummit, Daniel Nagy gave a talk titled “Privacy on Swarm”. This presentation gives a basic overview of Swarm’s layered structure and how data chunks work, he then goes onto explaining the encryption algorithms and processes Swarm uses to protect data.
A
A
Okay,
is
it
audible
now
like
this
okay,
so
today,
I'm
going
to
talk
about
the
various
facilities
that
swarm
provides
for
is
going
to
provide
for
protecting
privacy
for
data
that
are
stored
in
swarm
and
in
order
to
do
that,
I
will
do
a
really
really
brief
overview
of
how
swarm
in
general
works
in
the
unencrypted
version.
Many
of
you
are
already
familiar
with
it,
but
I
think
this
is
worth
repeating
and
explaining
for
those
who
who
are
not
familiar
with
the
low-level
details,
then
I'm
going
to
talk
about
symmetric
encryption
of
content.
A
Okay,
so
swarm
is
a
layered
stack
where,
on
the
bottom
layer
we
have
a
network
which
forwards
and
stores
four
kilobyte
chunks,
so
they
are
stored
locally
and
they
are
passed
around
into
network,
and
this
particular
layer
has
no
idea
how
this
relate
to
each
other.
So
they
treat
chunks
as
units
of
information
that
are
entirely
independent
of
each
other.
Above
that
we
have
a
layer
that
handles
arbitrary
lengths,
binary
files
without
Anna
metadata
and
above
that
we
have
a
layer
which
deals
with
collections
of
files
which
all
have
a
URL
very
similar
to
webpages.
A
So
basically,
it's
a
virtual
content
address
web
server,
which
has
metadata
attached
to
collections
of
files
and
above
that
we
have.
This
warm
hosted,
distributed
applications
which
can
access
the
layer
below
it
and
even
the
raw
layer
which
deals
with
with
files
without
metadata.
But
applications
cannot
typically
access
data
on
the
chunk
level.
A
So,
what's
in
a
chunk,
it
has
64
bits
which
we
call
the
span,
which
describes
the
length
of
the
file
encrypted.
Sorry,
encoded
by
the
subtree
of
at
the
root
of
which
this
particular
chunk
is
and
at
most
4
kilobytes
of
payload.
So
if
the
span
is
smaller
than
4
kilobytes,
then
this
means
that
this
is
a
leaf
chunk,
meaning
that
the
payload
information
in
this
chunk
is
actually
the
content
of
the
file.
Otherwise,
the
chunk
is
called
intermediary
and
it
is.
It
contains
references
to
sub
trees.
A
So
here's
an
example
where
we
have
a
10
kilobyte
file,
how
it
is
stored
in
swarm.
It
is
worth
stored
as
4
chunks
in
one
chunk,
which
is
the
root
chunk.
We
have
10,
we
have
a
span
data,
a
span
field
which
tells
that
the
subtree
encodes
10
kilobytes
of
data,
and
then
it
has
three
references
to
three
chunks
and
these
references
are,
in
the
unencrypted
case,
just
hashes,
and
then
we
have
three
chunks.
A
Two
of
them
are
full
they're,
four
kilobyte,
four
kilobyte
chunks
which
have
parts
of
the
payload,
and
the
last
chunk
is
just
two
kilobytes,
also
having
the
last
2k
2
kilobytes
of
the
payload.
So
that's
how
a
ton
kilobyte
this
file
is
actually
stored
in
swarm,
and
on
top
of
this
we
have
these
manifests,
which
are
merkel
eyes,
key
value
database.
A
So
that's
that's
that's
what
swarm
is
and
that's
what
you
need
to
know
about
it
in
order
to
understand
the
rest
of
the
discussion,
so
symmetric
encryption
in
swarm
is
achieved
using
this
so
called
CTR
mode
or
counter
mode.
This
is
a
illustration
from
Wikipedia
I
think
it
is
very
illuminating.
So
it's
a
good
illustration.
That's
why
I
haven't
drawn
my
own.
A
This
is
a
very
popular
mode
of
operation
in
modern
crypto
systems,
because
it
is
actually
very
convenient
to
reason
about
it
and
given
certain
assumptions,
reach
conclusions
using
just
for
like
methods
of
formal
logic,
so
create
formal
proofs
of
security
that
are
dependent
on
a
clearly
clearly
denoted
set
of.
So
how
this
counter
mode
works
is
that
you
have
a
key,
and
then
you
have,
in
this
case
it's
as
a
block
cipher,
but
in
fact
it
can
be
any
one
wave
function,
any
cryptographic,
one
wave
function
where
you
have
a
nonce
and
the
counter.
A
You
combine
them
into
a
one-way
image
of
these
two
things.
So
the
key,
the
nonce
in
the
counter,
and
then
you
accelerate
with
the
plaintext
to
get
the
ciphertext
or
accelerate
with
the
ciphertext
in
order
to
get
the
plaintext.
So
the
encryption,
the
decryption
operations
are
identical
and
one
benefit
of
this
is
that
it
allows
for
partial
decryption
of
the
content,
namely,
if
you
only
need
this
middle
part,
for
example.
So
you
have
the
ciphertext
and
you
need
the
plaintext
of
the
middle
part.
A
A
First,
as
a
compression
function
that
compresses
the
key
of
the
counter
and
then
just
as
a
one-way
function,
with
the
same
input
and
output
length,
and
then
we
XOR
the
result
with
the
plaintext
to
get
the
cipher
text
or
with
the
cipher
text
to
get
the
plaintext.
So
why
are
we
doing
this?
Why?
Why
are
we
running?
Why
are
we
using
sha-3
and
why
are
we
using
it
twice?
A
So
why
are
we
using
it
twice?
So
the
reason
we're
using
it
twice
is
because
we
want
to
allow
for
selective
disclosure,
so
in
order.
So
if
we
use
Shari
only
once
so,
the
second
box
would
not
be
here
then,
in
order
to
reveal
the
plaintext
given
the
ciphertext
for
this
particular
block,
you
would
need
to
reveal
the
input
to
this
Shari
box,
which
is
the
key
on
the
counter.
But
since
you
have
the
key,
you
would
be
able
to
decrypt
the
whole
chunk,
not
just
that
particular
block
within
the
chunk.
A
Well,
not
impossible,
but
inconvenient,
whereas
here
what
you
can
do
is
you
can
reveal
the
result
of
this
shot
three.
So
you
only
reveal
this,
and
these
data
are
still
protected
by
the
one-way
nature
of
the
Shari
function.
So
you
cannot
recover
the
key
on
the
counter
from
this
data.
However,
you
can
still
the
crib
the
plain
text
on
the
cipher
text,
so
this
is
so.
This
allows
for
selectively
disclosing
256
bits
of
any
particular
encrypted
content.
Of
course,
it
is.
A
Waitwait
of
it,
so
let
me
get
there.
So,
of
course
you
could.
You
could
reveal
this
here,
but
the
problem
with
that
would
be
exactly
what
I
have
shown
you
last
time
is
that
for
any
ciphertext
and
any
plaintext
I
could
come
up
with
a
value
that
would
risk
that
would
create
that
plain
text
from
ciphertext.
A
If
we
move
one
step
up
from
here
to
here
and
this
close
what's
here,
then
we
can
also
show
something
that
is
not
there,
because
we
just
get
a
random
value,
apply
3,
accelerate
with
the
ciphertext
and
then
claim
that
that
is
the
that
is
the
plaintext
that
we
that
we
have
encrypted.
So
this
kind
of
thing
in
cryptography
is
called
existential
forgery,
but
existential
forgery
is
much
better
than
being
able
to
have
so
the
previous
one.
When
you
revealed
this
and
you
get
any
ciphertext
for
any
particular
plaintext
that
is
called
Universal
forgery.
A
This
is
just
existential
forgery,
meaning
that
you
can.
You
can
claim
for
that
data
to
be
some
round
random,
other
data,
but
you
cannot
control
it.
So
if
there's
any
integrity,
protection
on
that
on
the
plaintext
that
imposes
a
constraint
on
the
decrypted
plaintext
that
only
the
real
plaintext
can
meet
with
a
very
very
high
probability
without
going
into
mathematical
details.
A
If
we're
using
256
keys
as
we're
using
it
now,
we
got
a
128-bit
security,
so
the
probability
of-
and
this
is
because
of
the
birthday
paradox-
but
I
really
don't
want
to
go
into
mathematics,
because
that
would
take
a
long
time,
but
basically
the
attacker,
who
actually
wants
to
forge
a
ciphertext
in
a
way
yeah
sorry
forge
a
plaintext
given
the
ciphertext.
They
would
need
to
do
2,
^
128
attempts,
which
is
still
beyond
the
realm
of
feasibility.
A
So
this
means
that
if
there's
any
cryptographic
integrity
protection
on
the
on
the
plaintext,
then
revealing
the
preimage
of
the
last
shot
3
and
obtaining
a
plaintext
that
actually
meets
the
constraint
imposed
by
the
integrity
protection
being
it
a
Mac,
a
hash
or
a
digital
signature.
Then
it
is
a
proof
beyond
reasonable
doubt
that
it
indeed
is
the
actual
plaintext
that
has
been
encrypted
and,
in
particular,
when
we
want
to
do
inclusion
proofs
in
a
large
file
encrypted
by
swarm.
A
Then
the
neat
thing
is
that
the
fact
that
the
the
plaintext
is
the
hash
of
a
chunk
further
down
in
the
Merkle
tree
that
itself
serves
as
a
integrity
protection.
So
if
I
reveal
a
plaintext
and
then
in
swarm,
I
find
a
chunk
that
actually
matches
that
hash,
that
it's,
that
by
itself
proves
that
the
plaintext
that
I
have
revealed
is
actually
the
plaintext
that
has
been
encrypted
because
under
existential
forgery,
the
chances
of
finding
a
matching
one
are
diminishing
with
slow
small.
A
So
in
encrypted
swarm
references
are
more
than
the
hash
of
the
plaintext.
They
are
the
hash
of
this
ciphertext,
plus
the
decryption
key.
So,
instead
of
the
32
bytes
in
the
unencrypted
versions,
we
have
references
that
are
64
bytes
in
total
and
in
particular
for
the
API.
It
means
that
you
can
use
the
exact
same
api's
that
you
use
for
the
unencrypt.
It's
warm.
A
So
as
long
as
we
trust
char
tree
to
be
a
cryptographically
secure,
one-way
function,
this
design
is
formally
verifiable
and
that
assumption
is
so
heavily
weaved
into
the
fabric
of
etherium
that,
if
that
assumption
breaks,
we
will
have
much
bigger
problems
and
because
and
want
one
other
thing.
What
makes
a
tree
attractive
is
that
char
tree
is
available
in
EVM
the
Sirian
virtual
machine.
So
it
makes
this
encryption
smart
contract
friendly,
so
you
can
actually
check
inclusion
proofs
by
a
smart
contract.
A
So
you
can
make
statements
about
contents
of
encrypted
files
in
such
a
way
that
you
can
prove
everything
using
just
shot
rehashes
without
totally
disclosing
and
decrypting
the
file
or
revealing
the
decryption
key,
but
simply
selectively
disclosing
only
the
data
that
you
want
to
disclose.
So
that's
where
we
stand
now.
This
is
already
implemented
and
I
encourage
everybody
to
try
it
out
all
the
example
gaps
that
the
swarm
team
publishes
together
with
together
with
swarm.
All
all
of
them
are
already
encryption
friendly.
So
you
can
mount
encrypted
volumes
using
the
fuse
interface.
A
You
can
browse
them
using
the
swarm
Explorer
and
you
can
create
encrypted
private
photo
albums,
which
means
that
the
vision,
creating
a
decentralized
drop
drop
box
like
functionality
where
you
can
store
your
files
and
only
share
them
with
those
with
those
with
whom
you
actually
want
to
share
them,
is
really
really
close.
So
it's
it's
going
to
be
it's
going
to
be
already
probably
soon
this
year
within
a
few
months.
A
A
So
currently
in
ENS,
you
can
only
obviously
you
can
only
register
short
references
to
unencrypted
content
and
it
could
kind
of
defeat
the
purpose,
if
you
put
the
decryption
key
on
the
blockchain,
so
there
needs
to
be
some
kind
of
identity
based
identity
based
access
control,
which
is
the
topic
of
the
third
part
of
my
talk.
This
is
precisely
what's
missing.
A
So
I
haven't
really
meant
it
shot.
Three,
so
I
have
relied
on
pre-existing
shot,
recode
the
same
that
Gath
uses
for
every
other
purpose
in
aetherium,
but
the
counter
mode
encryption.
It
had
to
be
reimplemented
well
from
scratch,
using
sha-3
as
a
primitive,
so
it
fits
onto
one
screen
a
source
code.
It's
a
really
short
and
clear
implementation.
A
A
Correct
so,
if
you're
not
running
the
node
yourself
you're
using
the
Gateway
than
the
Gateway
decrypt
and
drain
cryptic
using
the
SSA
SSL,
this
secure
socket
layer,
which
means
that
you
need
to
trust
the
Gateway.
That's
true,
but
you
know
in
the
bright
future.
You're
gonna
you're
gonna
run
at
least
a
light
note
on
your
mobile
device
or
your
desktop.
So
don't
don't
rely
on
gateways.
A
Right
so
the
asymmetric
encryption
that
we're
going
to
use
is
the
elliptical
version
of
Aldama
encryption,
so
the
identity
will
be
a
sub
P,
256
K
1
public
key,
just
like
the
keys
that
you
use
for
etherium
addresses,
and
the
reason
for
this
is
that
there's
already
quite
a
bit
of
trust
and
confidence
in
the
security
of
this
system
and,
moreover,
there
are
actually
tools
available
for
managing
such
keys,
including
very
high
security.
A
specialized
hardware,
like
the
hardware
wallets
that
people
are
using
to
store
their
their
crypto
funds.
A
I
would
like
to
single
out
treasure
and
ledger
as
the
two
most
popular
solutions,
so
they
are
already
able
to
handle
this
kind
of
keys,
and
we
would
like
to.
We
would
like
to
tie
the
Identity
Management.
The
identity
base
the
encryption
of
swarm
to
this
same
key
so
that
you
can
use
your
hardware
wallets
or
your
already
stored
private
keys.
The
same
way
you
use
for
accessing
your
aetherium
funds
and
sending
transactions
to
the
block
changes.
A
Well,
you
don't
have
to
I
mean
you
will
need
to
do
that
in
case
you
want
to
decrypt
sensitive
data
that
justifies
the
use
of
a
hardware
rod.
The
fact
that
you
have
a
hardware
wallet
does
not
force
you
to
use
Hardware
wallet
for
everything
you
can
still
have
private
keys
on
your
aetherium
node
right.
A
Well,
it
depends
on
which
I
don't
see
use
if
you
use
an
identity.
That
is
on
your
on
your
hardware,
wallet
danya's,
but
actually
I
think
that
in
the
not-too-distant
future,
it
would
be
a
good
idea
to
have
a
have
a
hardware.
What
even
for
frequent
frequent
use
transactions,
because
the
fact
that
etherium
transactions
are
rare
is
something
that
I
think
money
in
this
room
are
hoping
to
change.
A
So
the
plain
text
in
this
case
is
a
symmetric
key
denoted
by
M,
which
is
256
bits
and
the
way
we
use
it
has
been
discussed
in
the
previous
part.
So
the
ciphertext
and
the
encryption
of
the
decryption
of
the
formulae
are
there?
That's
how
you
do
algum
up
and
how
do
you
use
it?
So
access
management
is
done
through
ACLs,
which
are
referenced
in
the
metadata
in
the
method
data
part
of
the
manifests.
A
So
this
means
that
for
every
subdirectory
and
even
for
every
file
in
a
swarm
collection,
you
can
have
separate
access
control
lists
and
the
access
control
lists
contains
a
public
key
of
the
ACL
owners.
So
there's
somebody
who
has
the
means
of
modifying
these
ACLs
and
their
public
key
is
part
of
the
ACL,
and
the
ACL
itself
is
also
a
manifest.
So
it's
a
key/value
table
a
mapping
that
map's
diffie-hellman
shared
secrets
with
the
is
with
the.
A
Published
public
key
and
the
encrypted
the
encrypted
secret
keys,
with
the
other
key
being
the
public
key
of
the
of
the
identity
to
which
access
is
granted.
So
the
value
is
a
Ultimo
encrypted
content
key
and
the
key
is
the
diffie-hellman
shared
secret.
And
the
point
of
doing
this
is
that
if
you
don't
have
the
private
key
corresponding
to
a
serine
you're,
not
the
owner
of
the
access
control
list,
you
do
not.
You
do
not
know
who
is
on
the
like.
Who
is
on
the
access
control
list?
A
You
cannot
read
the
access
control
list
unless
you're
the
owner,
but
nevertheless,
if
you
are
in
the
access
control
list,
if
you're
one
of
the
listed
identities,
you
can
find
yourself
quickly.
So
that's
the
point
of
this
scheme,
and
the
only
thing
a
third
party
observer
can
glean
from
this.
Acl
is
a
upper
bound
on
its
size.
A
They
cannot
even
necessarily
know
the
actual
size
of
the
ACL,
because
there's
nothing
preventing
the
preventing
the
owner
of
the
ACL
to
stuff
it
with
bogus
data.
So,
even
if
you
have
like
three
keys
that
are
authorized
to
access
a
certain
part
of
the
collection,
you
can
still
have
an
ACL
list
which
is
100
entries
long
and
a
third
party
observer
is
non
divisor.
They
have
no
idea
which
entries
are
random
data
and
which
are
actually
access.
Control
entries.
A
A
A
A
Yeah,
you
encrypt
the
same
symmetric
key
with
three
public
keys
correct.
So
this
is
actually
the
topic
of
this
slide,
so
we're
using
hybrid
encryption
with
and
Kurtis
asymmetrically
encrypting,
the
secret
key
and
then
symmetrically
encrypting.
The
content
modifying
ACLs
requires
uploading,
a
small
number
of
chunks,
recalculating
the
root
hash
and
the
sinc
single
transaction
to
ENS
or
the
mutable
resource
update.
So
it's
it's
really
cheap
and
read.
Access
means
that
you,
you
have
ownership
of
the
key
and
you're
able
to
decrypt
the
content.
That's
that's.
What
read?
Access
actually
implies
and
write
access
is.
A
Implemented
within
the
DNS
resolver
contract,
which
means
that
your
key
is
somehow
permitted
to
change
to
change
the
entries
in
there
is
over.
So
that's
that's
the
right
access,
but
for
a
for
the
end
user,
if
you
will,
you
will
be
presented
with
a
drop
box
like
interface
when
you
can
simply
add,
read
and
write
access
to
various
parts
of
the
swarm
encrypted
content
necessary
swarm,
hosted
content
with
the
same
mass
of
words
that
people
are
already
used,
and
this
is
just
how
they're
going
to
be
implemented
in
a
decentralized
fashion.