►
From YouTube: Future of Decentralized Data Transfer
Description
In this session, hear how we're expanding upon graphsync, bitswap and better data transfer algorithms for IPFS and Filecoin.
A
Hey
hi
everyone.
I
am
so
excited
to
chat
with
you
today
about
our
evolving
data
transfer
protocols
in
filecoin
and
iupfs
and
how
we're
going
to
make
the
distributed
web
more
reliable,
accessible
and
interoperable
yeah.
A
So
my
name's
hannah,
I
am
a
programmer
with
the
web
3
data
systems
team
at
protocol
labs,
I've
been
working
on
ipfs
and
filecoin
for
about
three
years
and
honestly
in
my
entire
career,
I
have
never
worked
on
more
fascinating
technological
problems
than
I
do
now
and
I
hope
by
the
end
of
this
talk,
you
are
as
interested
by
them
as
I
am
also
bailey.
A
If
you
like
the
first
slide,
there
is
a
lot
more
david
bowie
coming
because
my
file
coin
orbit
theme
fell
to
earth
with
that
it's
full
of
space
oddities.
So
let's
go
ahead
and
get
started.
I
want
to
start
with
a
very
simple
question:
do
you
trust
the
data
that
you
download
on
the
web
when
you
enter
a
value
in
your
web
browser?
How
do
you
know
the
content
that
you're
looking
for
is
actually
what
you
got
and
the
the
surprising
answer?
A
Is
you
don't
really
know
on
in
the
traditional
web?
If
the
website
uses
ssl,
you
have
some
guarantee
that
you're
downloading
from
the
right
site,
but
you
still
don't
know
if
you're
downloading
the
right
data,
because
you
don't
know
if
that
site's
been
hacked
plus
ssl
is
like
pretty
cumbersome
to
set
up
and
maintain
for
people
who
are
running
websites
and
while
there
are
in
fact
ways
to
check
your
data
they're
so
cumbersome
that
only
the
most
technical
people
use
them.
A
So,
with
it's
somewhat
surprising
that
the
whole
web
works
at
all,
but
in
the
distributed
web
we
want
to
do
better
right
and
that's
what
I
think
is
one
of
the
most
core
innovations
of
ipfs
and
bilecoin
we've
developed
data
formats
and
addressing
schemes
that
provide
much
better
guarantees
about
the
data
you
download.
Trusting
data
is
no
longer
about
who
you
get
it
from,
but
rather
whether
it
matches
a
cryptographic
contract.
That's
built
into
the
address
that
you
use
to
find
it
put
simply.
You
cannot
download
something
you
did
not
ask
for.
A
So
let's
talk
briefly
about
our
data
formats
and
what
they
enable
us
to
do.
The
first
thing
I
want
to
talk
about
is
a
content
identifier
or
cid.
It
is
a
way
to
identify
content
that
is
cryptographically
secure
inside
of
a
content.
Identifier
is
a
cryptographic
hash
of
the
types
of
the
block
of
bytes
that
you're
going
to
download,
and
the
cid
also
contains
information
about
how
data
was
encoded
to
help
us
interpret
those
bytes
at
a
higher
level.
A
As
a
quick
aside,
cid
is
sometimes
pronounced
just
sid
and
I'm
gonna
try
not
to
do
that.
That's
sort
of
an
internal
shorthand
when
you
do
we
do,
but
if
I
do
just
that's
what
I
mean,
but
I'm
going
to
try
to
say
cid
so
most
blocks
of
bytes
on
our
network
decode
into
a
structured
data
format
called
ipld
or
interplanetary
linked
data.
A
This
means
that
you
can
take
the
bytes
you
get
for
a
block
and
and
decode
them
into
something
much
more
meaningful
and
the
data
model
of
ipld
is
similar
to
json,
in
that
it
has
basic
data
types
like
strings
and
numbers
and
collections
like
lists
and
maps.
However,
ipld
also
supports
embedding
content
identifiers
into
that
structured
data
and
that
allows
us
to
break
up
large
data
sets
over
lots
and
lots
of
blocks
and
then
the
sort
of
contrived
example.
You
see
on
the
your
screen
on
the
screen.
A
We
have
a
list
of,
we
have
a
list,
it
has
two
maps
inside
of
it
and
the
first
number
is
a
cid
and
then
and
in
the
first
map,
there's
a
cid
that
links
to
an
image
and
calls
cheese,
image
and,
and
then
there's
a
string
value
and
then
in
the
second
map,
you've
got
you
or
sorry,
there's
a
number
value,
and
the
second
map
has
a
a
c
id
that
links
to
further
data,
as
well
as
a
string
value
with
the
name
of
the
fruit.
A
In
this
case,
the
wiki
and
ipld
is
in
a
lot
of
ways
very
similar
to
html.
It
is
almost
a
hypertext
format,
but
it's
structured
data
that
a
computer
can
understand
and,
and
the
links
are
cids
that
will
guarantee
we
get
the
right
data
so
fun
fact.
Almost
all
data
that
is
stored
in
filecline
is
stored
is
encoded
in
ipld.
A
When
you
make
deals
in
file
coin,
you
probably
make
them
for
files
or
directories,
but
all
of
those
are
encoded
to
ipld
formats
before
they
get
transferred
and
that
produces
some
unique
advantages
that
we're
going
to
see
in
just
a
moment
large
data.
So
so
so
that's
our
format
and
then
large
data
sets
in
ipld
can
get
spread
out
over
multiple
blocks,
where
each
block
is
identified
by
its
own
cid
and
embedded
into
the
ipld
data.
For
that
block
are
links
to
other
blocks,
all
going
back
to
an
individual
root
block.
A
We
call
this
a
merkle
bag.
The
awesome
thing
about
this
is
while
the
root
cid
is
just
a
hash
of
the
initial
blocks.
In
the
first
root
block,
you
can
use
it
to
incrementally
verify
the
entire
graph.
Once
you
find
and
verify
the
first
block
you
can
go,
you
have
c
ids
for
the
next
blocks
and
you
can
go,
find
them
and
verify
them
until
you
can
keep
going
until
you
know
the
whole
graph.
It's
cryptographically
the
data
you
want
it
and
we've
even
written
a
query.
Language
for
ipld
called
selectors
and
selectors.
A
A
There's
a
lot
of
benefits
that
emerge
from
these
core
data
formats
using
cids
means
you
never
get
the
data
you
you
never
get
data
you
didn't
ask
for.
We've
talked
about
that
already
and
since
you
can
verify
data
that
you
can
verify
the
I
the
integrity
of
data,
it
doesn't
matter
where
you
get
it
from,
because
you
know
it's
the
right
data
ipld
formats
allow
us
to
break
up
large
data
structure.
A
Data
sets
in
incrementally
verifiable
chunks,
and
that
means
we
can
get
data
faster,
but
we
can
ask
lots
of
people
for
smaller
parts
of
the
whole
and
plus
we
never
need
to.
If
we
have
a
single
chunk
that
appears
twice
in
a
graph,
we
only
need
to
send
it
once
finally
oops
sorry.
Finally,
we
are
starting
to
see
the
building
blocks
here
for
trustless
payments
for
data.
If
I
can
verify
data
incrementally,
I
can
break
large
transfers
into
much
smaller
transactions
minimizing
risk
on
both
sides
for
paid
transfer.
A
It's
like
programming
in
the
90s
and
you're
trying
to
figure
out
how
to
implement
http
or
early
filing
for
file
sharing
protocols
like
napster,
but
you
know
do
it
better,
so
the
first
distributed
protocol
we
ever
we
implemented
for
ipfs
was
bitsoft
and
bitswap
is
a
block
exchange
protocol.
It
doesn't
actually
understand
anything
about
ipld.
What
you
can
do
is
you
can
ask
for
a
cid
and
get
a
blog
from
a
peer
and
get
a
block
back,
it's
kind
of
like
bittorrent.
A
In
that
sense,
you
ask
peers
for
individual
parts
of
a
larger
data
set
and
then
assemble
them
yourself.
The
ipld
knowledge
here
lives
entirely
outside
of
bitswap.
Usually
it's
on
the
client
who's
requesting
data.
This
has
some
really
great
advantages
and
that's
why
bitswap
remains
the
core
transfer
protocol
in
ipfs.
Today
there
are
there's
some
great
things
at
work
about
working
at
the
block
level.
Only
first
since
you're
only
asking
for
one
block
at
once.
It's
really
obvious
how
to
break
up
repeat,
break
up
requests
among
many
peers.
A
You
just
ask
different
peers
for
different
blocks
and
since
you're
breaking
up
requests,
it's
easy
to
make
them
in
parallel
and
plus,
because
the
person
who's
sending
data
in
response
to
a
request
is
only
sending
block
data
they're
only
sending
bytes.
They
don't
have
to
understand
the
format
that
those
bytes
are
encoded.
They
don't
need
to
know
anything
about.
Idpld,
plus
bitswap
is
a
really
mature
protocol.
We've
worked
on
it
a
while
and
the
go.
Implementation
is
pretty
efficient
and
it
does
a
whole
lot.
A
There
are
some
challenges,
though,
bit
swap
because
you
only
ask
for
one
block
at
a
time.
There's
certain
types
of
nested
traversals
like
fetching
a
subdirectory
deep
in
a
graph
in
a
nested
directory
where
you
they
produce
lots
of
long
trip
of
round
trips
to
get
the
data,
because
you
don't
know
what
the
cid
is
farther
into
the
graph.
A
Are
you
have
to
ask
for
the
first
and
then
the
second
and
then
the
third
and
then
the
fourth
all
the
way
to
get
to
your
end
and
that's
pretty
inefficient,
also
bit
swap
for
people
sending
data
since
you're,
only
getting
very
small
requests,
it's
pretty
hard
to
optimize
your
disk.
I
o
and
your
network,
I
o,
so
you
can
send
lots
of
data
at
once.
A
Finally,
the
block
level
nature
of
bitswap
does
tend
to
produce
a
lot
of
network
traffic
similar
to
the
traffic
congestion
problems
that
you
see
in
bittorrent,
and
this
has
resulted
in
you
know
the
the
the
the
other
part
is
not
a
problem
with
bitsweb
itself,
but
the
implementation
of
bitswap.
If
we've
we've
written
for
ipfs,
the
go
version
has
kept
growing
and
growing
and
growing.
So
it's
way
more
than
a
protocol
implementation.
A
It's
almost
like
an
entire
data
transfer
stack
and
it's
so
tightly
integrated
with
go
ipfs
that
it's
almost
impossible
to
put
to
pull
them
apart.
So
it's
not
surprising.
We
ended
up
writing
a
second
transport
in
the
context
of
building
an
entirely
new
technology
with
filecoin,
so
the
new
transport
is
called
graphsync.
A
It's
kind
of
our
our
our
new
hotness
and
it's
it
powers
all
of
the
data
transfer
and
filecoin
and
graphsync
is
a
protocol
for
replicating
entire
ipld
graphs
across
peers,
rather
than
request
a
single
block
in
grasssync
we
request
a
sid
and
a
sid
selector,
and
that
allows
the
peer
to
stream
us
all
the
blocks
that
match
that
query.
In
essence,
we're
performing
an
entire
query
against
a
remote
ipld
data
set,
and
we
can
ask
for
and
receive
much
more
data
at
once.
A
We
developed
grassland
because
we
knew
that
for
storage
and
retrieval
deals.
We
would
require
fast
point-to-point
transfers
between
clients
and
miners
similar
to
http
and
grassland,
because
the
response,
because
the
person
sending
data,
was
processes
much
higher
level
requests,
they
can
read
and
marshal
their
data
ahead
of
time
and
send
it
over
the
wire.
So
there's
no
extra
round
trips
and
we
should
be
able
to
max
out
the
speed
of
the
network
pipe
as
and
since
we
started
from
scratch.
Grassland
is
a
lot
more
configurable
than
bitswap.
A
We
support
custom
authentication,
sending
side,
channel
information
like
payments
and
distributing
individual
requests
to
different
data
stores
on
the
receiving
side.
A
lot
of
these
things
are
easier
to
deal
with
when
you're
dealing
with
higher
level
requests
in
this
configurability
has
enabled
some
of
the
really
cool
optimizations
that
we're
starting
to
see
in
filecoin
that
you
may
have
heard
about
like
the
dagstore.
It's
allowing
us
to
do.
A
I
o
at
much
faster
speeds
and
we're
starting
to
support
really
large
storage
clients
like
estuary,
that
are
setting
out
tons
of
data
at
once,
and
that
kind
of
scalability
is
just
really
hard
to
do
with
fit
swap.
A
But
grassland
has
its
own
challenges
first,
because
you're
dealing
with
large
requests,
it's
harder
to
break
them
up.
It's
not
clear
how
to
it's
not
always
clear
how
to
break
up
a
selector
among
peers
and
second,
we
want
to
maintain
our
guarantees
around
incremental
verifiability
and
that
turns
out
to
be
fairly
complicated.
So
our
implementation
is
complicated.
A
The
other
part
challenge
with
graphsync
is
you
need
a
greater
shared
understanding
on
both
sides
of
the
network
transport,
because
the
responder
is
essentially
doing
an
ipld
selector
query:
they
have
to
be
able
to
execute
it
locally
to
get
all
the
data,
and
this
gets
complicated
because
ipld
is
pretty
extensible
and
has
an
and
supports
a
number
of
customizations.
A
And
finally
again
this
isn't
a
problem
of
brassync,
but
gobitswap
is
so
it
does
so
much
and
it's
so
much
more
than
a
transport
protocol.
So
when
you
start
a
new
protocol,
you
don't
have
all
that
and
it's
just
missing.
A
So
we've
got
some
challenges
and
probably
our
out
most
obvious
challenge
that
you
may
may
have
been
appearing
apparent
up
until
now
is
that
we
have
two
different
stacks.
We
have
a
transport
stack
for
ipfs
and
a
transport
stack
for
falcon,
and
it's
not
just
bit
swap
and
graph
sync.
A
We
it's
all
the
services
that
we
built
on
top
of
them
in
bitswap,
we
have
bitswap
sessions,
which
is,
is
sort
of
an
optimization
protocol
for
doing
high
higher
level
transfers,
and
then
we
have
services
like
the
block
service
and
dag
service
that
are
baked
on
top
of
bitswap
and
hardcoded
into
that
in
filecoin
we
have
graphsync
and
then
we
built
other
components
on
top
of
it
to
support
file
coins
needs.
We
have
a
control
protocol
for
called
the
data
transfer
protocol.
A
We
have
a
file
coin,
retrieval
protocol
and
a
storage
protocol.
And
finally,
we
have
the
support
for
payment
channels
so
that
we
do
in
fact
support
paid
transfers
and
gr
in
file
coins,
but
the
but
the
larger
problem
is
that
data.
Retrieval
is
so
about
so
much
more
than
transport
protocols
to
build
interoperable
solutions.
A
The
first
step
is
content
routing.
This
is
essentially
finding
your
your
content
in
the
regular
web.
You
ask
users
to
know
website
names,
and
then
you
use
dns
to
find
those
websites
in
the
distributed
web.
We
just
start
with
a
content
identifier
that
could
be
hosted
anywhere
and
we
need
to
track
down.
Who
has
that
content
and
how
they're
making
it
available?
A
We're
we've
developed
again
different
solutions
here
with
ipfs
we
have
the
dht
or
distributed
hash
table
which
enables
us
to
find
any
content
on
ipfs,
even
though
it's
not
it's,
sometimes
not
as
fast
as
we'd
like
it
to
be.
We
also
have
a
bitswap
want
protocol
that
enables
like
speeding
things
up
among
local
peers,
because
you
can
ask
them
whether
they
have
a
block
without
them,
actually
sending
it
to
you
on
the
file
points
side
where
it's
really
still
early
days
will
gave
a
talk.
A
Will
scott
gave
a
talk
earlier
today
about
the
indexing
solutions,
we're
building
to
help
people
find
content
on
filecoin,
but
a
lot
of
it
is
still
in
development.
A
Once
you've
found
content,
it's
actually
not
an
immediate
step
to
making
a
trans
to
making
a
transfer.
We
need
to
plan
the
best
way
to
get
it
from
potentially
multiple
sources,
and
there
are
a
lot
of
factors
we
need
to
think
about.
A
We
want
to
get
it
fast
and
we
know
we
probably
want
to
get
it
free
or
low
cost,
but
that
might
mean
we're
distributing
the
requests
over
many
peers
or
switching
between
protocols
or
making
trade-offs
to
maybe
pay
to
get
things
faster
plus,
we
have
to
recover
from
failures
if
peers
turn
out
to
not
have
content.
Concretely,
almost
none
of
this
exists
for
file
point
content.
Today
it
it's
it's
largely
manual
and
the
we
do
have
a
lot
of
planning
in
ipfs
in
bitswap.
That's
what
the
biswap
sessions
largely
do.
It's
really
great.
A
It
just
only
works
for
bits.
Finally,
we
we
have
what
I've
already
been
talking
about,
which
is
transport.
How
do
you
get
things
from
one
place
to
another,
and
here
we
have
really
great
protocols,
but
there's
one
thing
missing:
how
and
that's
accessing
all
of
the
web
2
data
right
now
to
use
our
protocols,
you
have
to
run
an
entire
lib
p2p
stack.
What
if
we
could
bring
some
benefits
like
con
from
like
content
and
id
and
identifiers
and
incremental
verifiability
to
data
that's
available
only
on
the
traditional
web?
A
So
I
want
to
talk
to
you
briefly
about
how
we're
going
to
figure
this
out
going
forward.
We
know
we
want
people
to
be
able
to
download
ipld
content
from
ipfs
filecoin
and
even
maybe
http
as
fast
as
possible,
without
having
to
think
about
where
it's
coming
from
or
what
protocols
it's
sent
by
wow.
That's
a
tall
order
when
you
say
it
all
together.
Well,
the
good
news
is:
we've
begun
work
on
a
prototype
client
that
can
retrieve
data
from
ipfs
and
filecoin
automatically
choosing
the
best
way
to
get
it.
A
So
that's
really
good
news.
So
how
are
we
gonna
do
this?
The
first
thing
we're
gonna
do:
is
we're
really
gonna
cleanly
separate
these
stages
of
data?
Retrieval
I've
been
talking
about.
Well,
all
of
these
steps
are
asynchronous.
They
are
distinct.
We
want
to
separate
finding
content,
planning
requests
and
executing
transfers.
Now
it's
not
entirely
that
simple,
because
feedback
goes
both
ways
in
a
large
transfer
request.
A
We
still
want
to
be
able
to
swap,
but
we
still
want
to
be
able
to
swap
different
solutions
at
each
step.
What
about
we're
thinking
about
content
routing?
Now
it's
a
problem
of
finding
content
from
multiple
sources.
Some
of
them
are
fast,
but
have
very
little
content
like
a
local
data
store
or
local
peers,
and
some
of
them
are
very
slow,
like
the
ips
ipfsdht,
but
have
tons
of
content.
A
We
already
actually
do
this
without
thinking
about
this
in
our
ipfs
stack,
but
we
know
there's
going
to
be
more
sources
over
time,
so
we
need
to
build
a
framework
for
fetching
from
multiple
sources
and,
honestly,
we
know
other
people
might
build
better
solutions
for
finding
content
so
soon
you're
going
to
see
delegated
content
routing
coming
to
ipfs
and
in
our
new
retrieval
stack.
We
want
to
allow
people
to
plug
in
their
own
solutions
at
each
level
the
fastest
way
to
get
to
awesome.
A
An
awesome,
retrieval,
client
that
downloads
fast
from
anywhere
is
to
provide
a
framework
in
which
people
can
discover
and
implement
the
best
in
class
solutions
for
individual
parts
of
the
problem.
Solving
planning
is
not
obvious.
The
planning
stage
is
is
going
to
be
a
complicated
one,
it's
not
easy
to
mix
protocols
and
peers
and
payments
and
always
deliver
the
optimized
solution.
We
have
some
thoughts
but
they're
just
preliminary
for
bitswap
and
graphsync.
A
We
think
we
can
mix
them
and
get
the
best
of
both
worlds.
Graphsync
helps
us
understand
whole
graphs
and
get
them
in
a
single
round
trip,
but
bitswap
is
good
at
splitting
requests.
So
what
if
we
use
graph
sync
to
get
a
description
of
the
repo
of
the
graph
in
terms
of
sids
and
bitswap
to
actually
fetch
it?
This
is
just
one
of
the
proposals
for
how
we
could
get
the
best
strengths
from
for
each
of
each
protocol
and
we're
just
gonna
have
to
evolve
solutions
over
time.
A
One
other
thing
we
are
definitely
considering
is
http
and
we've
developed
a
proposal
for
incrementally
verifiable
transport
over
just
http
1.1.
So
when
is
all
this
coming
I
don't
know
2022.
This
is
not
a
product
announcement.
This
is
not
your
fbm
keynote,
so
we'll
hope
it's
going
to
be
very
soon,
but
it'll
be
even
faster
if
you
help
out.
So,
thank
you
very
much
and
be
aware
of
the
spiders
from
ours.
Thanks.