►
From YouTube: IPFS Recovery - Govind Mohan & Hlib Kanunnikov
Description
HackFS finalists Hlib Kanunnikov and Govind Mohan take us on a behind-the-scenes look at the process and thinking behind their project IPFS Recovery, which creates a way for content to persist permanently despite any damage to data and the network by
bringing data recovery algorithms into IPFS protocol.
Discover IPFS Recovery at https://github.com/Wondertan/go-ipfs-recovery
For more information on IPFS
- visit the project website: https://ipfs.io
- or follow IPFS on Twitter: https://twitter.com/IPFS
Sign up to get IPFS news, including releases, ecosystem updates, and community announcements in your inbox, each Tuesday: http://eepurl.com/gL2Pi5
A
So
this
is
ipfs
recovery.
We
built
this
in
hackafest
2020.
That
happened
over
the
course
of
like
two
months
ago,
lasting
30
days,
so
this
is
kind
of
a
story
of
of
how
we
like
of
what
we
did
and
sprinkled
with
the
ample
technical
details
so
coming
into
this
hackathon,
I
had
a
lot
of
experience
working
with
distributed
error
correction
in
error
correction
in
the
context
of
distributed
system.
A
So
I
thought
it
would
be
a
pretty
cool
idea
to
bring
this
over
to
ipfs
and
what
are
the
problems
that
I
that
this
solves
so
like?
The
existing
problems
that
I
wanted
to
tackle
were
data.
A
Corruption
can
lead
to
losses
in
vital
information
because
data
when
you
have
a
distributed
system,
you
have
data
at
storage
and
data
in
in
transit,
and
you
want
to
ensure
that
there
is
integrity
of
both
both
these
modes
of
data,
so
data
at
rest
can
be
compromised
by
things
like
coffee
poured
on
your
laptop
or
like
a
massive
power
grip
failure,
and
if
I,
if,
if
content,
is
being
served,
potentially
vital
content
is
being
served
from
your
devices
onto
other
devices
in
the
network.
You
don't
want
this
to
happen.
A
You
want
to
ensure
that
there
are
different
ways
to
access
the
same
information
in
your
in
your
distributed
system,
so
you
also
have
the
problem
of
no
churn
where
devices
can
just
go
offline
for
any
reason
whatsoever.
I
could
just
decide.
I
want
to
turn
my
computer
off
right
and
also
there's
the
issue
of
censorship.
If
there
are
people
who
are
actively
targeting
data,
that's
certain
types
of
data
on
the
network.
A
They
could
go
to
any
link
to
make
sure
that
it's
not
there
and
finally,
transient
connectivity,
poor
internet
connectivity
can
happen
for
all
kinds
of
reasons
in
a
distributed
system.
So
it's
it's
important
to
have
that
that
consideration.
I
also
want
to
add
cyber
security,
has
three
components
which
is
confidentiality,
integrity
and
availability.
A
So
confidentiality
is
probably
the
one
that's
probably
paid
most
attention
to
so
integrity
and
availability
are
quite
important
too.
So
I
think
erasure
coding
is
is
what's
really
the
the
solution
that
that
allows
us
to
ensure
proper
cyber
security
for
distributed
systems.
So
what
is
the
ratio
coding?
It's
a
method
of
data
protection
in
which
data
is
broken
into
fragments,
expanded
and
encoded
with
redundant
data
pieces
and
stored
across
a
set
of
different
locations
or
storage
media.
Well,
what
this
jargon
actually
means
is
data
on
data.
A
That's
erasure,
coded
kind
of
becomes
like
a
hydra
where
you
have
multiple
heads
and
if
you
chop
off
some
of
the
heads
you
can,
you
can
like
it
just
keeps
recovering
over
and
over
again
until
you
chop
off
most
or,
if
not
all
the
heads.
So
it's
it's
worth
consuming
some
extra
storage
to
obtain
better
data,
resiliency
underwriting
performance
and
data
can
be
spread
across
the
network
geographically,
so
it'll
even
allows
for
better
performance
and
delivery
guarantees.
A
So
why?
Why
does
this
matter
specifically
for
ipfs?
Well,
I
think
the
distribution
of
data
in
the
limitless
manner,
that's
in
the
core
of
ipfs.
It
requires
strong
integrity
guarantees,
but
we
want
to
make
sure
that
data
is
available
at
all
costs,
no
matter
what
kind
of
data
it
is.
We
want
to
make
sure
it's
it's
available.
A
It
doesn't
matter
who
or
what
circumstance
is
trying
to
get
rid
of
that
data.
We
want
to
ensure
that
it's
still
there
and
it's
it's
it's
a
long
requested
feature.
A
There
are
issues
like
github
issues
from
like
2016
that
are
that
are
still
open,
that
that
address
this
issue
and
no
no
specific
solution
exists
for
the
ipld
layer,
which
is
where
we
actually
built
this
out
as
you'll
see
shortly
and
finally,
we
kind
of
kept
this
in
the
spirit
of
ipfs,
which
is
just
keep
it
very
modular
and
pluggable
in
in
many
different
ways.
So
we
actually
have
a
few
different
eraser
codes
that
we've
built
out
so
I'll.
A
Let
talk
a
little
bit
about
our
hack
and
what
we
actually.
A
B
There
you
go
hey
guys
I'll,
go
with
some
more
technical
details,
so
how
all
works
first,
as
my
original
goal
was
to
create
a
module
for
recovery
that
will
follow
most
of
the
best
practice
best
practices
I've
seen
in
ipfs
and
how
you
do
modular
things,
how
you
split
everything,
so
this
module
also
introduce
interfaces
that,
like
tries
to
be
like
useful,
abstract
and
like
convenient
to
use.
B
Originally,
we
started
like
to
think
we
we
we
got
that
we
need
ipfs
need
a
ranger
coding.
We
did
some
research
and
seen
that.
B
Actually,
there
is
a
lot
of
issues
regarding
that,
like
ideas
on
how
to
implement,
but
like
I
decided
to
use
to
integrate
that
in
ipld
and
ipfs
currently
work
with
first
version
of
ipld,
so
the
apfs
recovery
is
integrated
over
there
for
for
the
hug
demonstration
for
thing,
that's
due
to
to
show
something
on
hacker
fest,
we
also
added
a
new
comments
to
ipfs.
B
The
main
one
is
in
code
that
allows
you
to
encode
any
deck
to
re
to
recoverable
deck,
so
you
just
put
one
at
once
id
and
you
get
another
one
and
interesting
is
that
we
need
uipfast
get
or
using
the
other
hash
you
got.
You
will
still
see
the
original
file,
so
actually
you
just
interchange
one
one
hash
to
another
hash
and
you
still
be
able
to
get
original
content
by
this
newly
generated
hash,
which
also
has
a
recovery
feature.
B
Also,
there
are
like
different
ideas
for
commons
that
can
be
added
to
your
ipf
sli.
That
will
expand
the
ipfs
with
new
features
and
like
will
help
to
manage
recoveries
in
some
way,
so
you
can
do
manual,
recoveries
or
you
can
look
at
like
some
states
in
ipfs
recovery
model
that
are
that
can
be
useful
to
see
through
the
bug
sessions.
B
Our
main
implementation,
as
like
we,
which
is
rit
salaman,
which
is
kind
of
very
popular
erasure
coding,
and
it's
simple
to
understand-
I
won't
be
describing
how
it
works.
I
think,
in
terms
of
like
some
mathematical
details,
but
simply
it
just
allows
you
to
to
recover
to
first
I'll
generate
some
additional
blocks
for
your
existing
blocks.
B
For
example,
you
have
seven
blocks,
but
you
decided
to
generate
three
more
blocks
that
will
that
will
then
we
will
have
10
blocks
at
all,
but
if
you
will
lose
any
three
of
them
doesn't
matter
what
you
can
still
get
the
original
file,
so
you
still
can
regenerate
those
those
blocks
that
represents
the
actual
user
data,
even
if
you
lost
them
so
about
implementation.
We
needed
to
for
for
this.
B
For
a
ritual,
and
for
that
for
ability,
integration
we
needed
to
like
find
a
way
to
integrate
it
and
ipld.
B
We
zero
has
nodes
interface,
that's
can
be
addressed
by
a
cid
and
that's
id
defines
the
content
which
is
connected
to
like
actual
node
implementation,
and
we
added
a
new
cd
codec
like
custom
to
address
newly
new
new
nodes,
new
recovery
nodes
that
are
wrapping
protonates,
which
are
used
as
links
to
to
blocks
on
the
network
and
what
this
recovery
node
does.
It
also
like
adds
additional
free
links
to,
and
recovery
notes
links
to
do
to
the
node.
B
So
this
allows
the
recovery
worked
on
the
whole
ipfs
network.
That
means
that,
if
you
have
a
kind
of
you
have
a
recovery
node,
but
you
can
get
the
block
you
need.
You
can
fall
back
to
getting
the
block.
B
For
getting
recovery
notes
and
that
will
help
you
to
recover
the
block
you
originally
needed
and
the
good
thing
that
this
works
not
just
on
your
local
note,
but
if
you
want
to
distribute
your
content
and
share
it
with
others
on
ipfs,
it
might
be
maybe
very
demanding
and
it
flows
around
in
ipfs
network.
So
it
really
helps
to
recover
and
to
increase
availability
of
the
data
you
store
on
ipfs,
that's
kind
of
like.
Let's
continue
on
implementation,
so
this
id
hack,
this
id
is
nut.
It's
not.
B
It
is
actually
a
hack
because,
from
my
understanding
we
need
kind
of
separate
things.
It's
kind
of
plugins
new
free
model
where
we
can
add
additional
plugins
or
some
additional
repre
letters
or
something
to
make
say
these
additional
custom
data
to
to
address
that
some
additional
capabilities
of
the
data
you
address,
because
recovery
isn't
actually
a
codec.
You
can
tell
that
this.
No
any
node.
As
a
is
a
recap
like
you
can
tell
that
recovery
is
encoding,
encoding
or.
B
Because
codex
is
used
to
mostly
just
say:
okay,
I
use
json
encoding.
I
use
darkroad
above
I
use
any
other
codec
to
to
to
understand
how
to
get
the
blocks
to
to
the
memory
about
encoding
implementation.
So
there
is
a
important
quality
of
read
salaman.
It
states
that
all
the
blocks
should
have
same
size
and
that
was
kind
of
like
issue
for
us
to
to
solve,
because
ipfs
blocks
are
not
always
have
the
same
size,
but
rich
solomon
needs
them
to
be
the
same.
B
So
we
I've
decided
to
use
and
to
encode
the
size
for
the
actual
blocks
inside
the
redundant
nodes.
B
That
way,
when
we
try
to
recover
some
blocks,
we
copy
blocks,
we
have
into
charts,
there
is
kind
of
special
data
structure
and
it
it
manages
do
we
have
enough
needed
amount
of
blocks
and
it
helps
you
to
to
actually
restore
the
blocks
with
the
original
size
and
get
the
original
size.
B
B
I
hope
you,
I
hope
I
explained
this
understandable.
The
next
thing
I
would
like
to
tell
you
is
a
custom
dock
session,
so
for.
B
For
ipld
we
zero
ipf
has
used
dark
service
that
allows
you
to
get
the
ipfs
nodes
by
cids
and
there
is
a
concept
of
sessions
that
separate
some
logic.
B
Some,
like
group
of
notes
for
you
to
do
fetch
from
the
internet
or
got
it
locally,
so
in
custom.
Duck
session,
manages
key
pairs
of
already
recovered
of
the
garden
blocks
like
the
blocks,
you
retrieved
and
saves
them
as
a
reference
for
a
next
iteration
of
traversing
to
the
graph
to
understand
if
the
node
I'm
failed
to
get
from
the
network
to
get
its
parents.
B
This
is
kind
of
like
soft
link
currently
implemented,
and
just
just
in
memory
where
you
understand
who
can
recover.
The
note
I
have
did
not.
I
need
right
now
in
this
current
session.
So
apparently
they
are
saved
in
memory,
but
I
think
there
should
be
a
special
like
special
components.
B
That's
reps
data
store,
and
it's
just
says
locally
small
metadata
about
what
nodes
can
be
recovered
from
from
what.
B
This
won't
add
a
lot
of
like
additional
storage,
but
it's
still
useful
and
it
works
only
locally.
So
you
you,
the
node
nodes
knows
what
what
can
I
use
to
recover?
Something
something
that
I
need
something
user
asked.
B
The
fourth
thing
is
recover.
This
is
actually
an
interface
that
has
a
red
salomon
implementation.
B
B
It
may
cause
that
in
in
the
same
moment,
in
the
same
moment,
a
few
there
will
be
a
recovery
process
for
for
same
data
that
we
would
like
to
avoid.
So
there
is
a
recovery
singleton
that
manages
different
recovery
sessions
and
what
allows
it
to
for
any
user
of
ipfs.
To
point
to.
B
Oh
sorry,
my
my
headphones
are
off
to
point
you
to
recovery
sessions
that
can
be
going
on.
At
the
same
time.
That's
probably
all
about
implementation,
reid,
solomon.
I
think
there
is.
B
This
that's
all
I
wanted
to
tell
about
this.
The
fourth
thing
is
draft
for
novel
alpha
entanglement
implementation.
B
So
eric
solomon
is
industry
standard
for
erasure
codings,
but
there
are
a
lot
of
ongoing
research
regarding
codings
and
there
is
interesting,
noble
alpha
entertainment
paper
and
going
had
a
chance
to
even
contact
their
the
people
who
were
created
this
paper
and
they
also
did
a
hack
for
ethereum
and
they
integrated
somehow
those
noble
alpha
entanglements,
but
we
decided
to
keep
this
out
of
the
scope
like
ritzleman,
is
easier
to
understand
and
easier
to
implement.
But
we
have
a
draft
implementation
for
authentication.
A
Yeah
sorry
I'd
just
like
to
say
a
couple
of
things
about
that.
First
of
all,
I
was
stuck
on
mutually
for
the
longest
time,
so
I
couldn't
say
anything
yeah,
so
alpha
entitlement
is
a
way
of
data
being
spread
across
the
network
where
you
have
actual
data,
when
you
have
data
uploaded
onto
the
network,
which
is
could
be
multiple
different
files,
so
you
could
upload
a
file.
Someone
else
could
upload
a
file.
A
There
are
redundancies
that
are
formed
for
both
these
files
in
such
a
way
that
your
file
could
potentially
help
someone
else's
recover,
someone
else's
file
which
could
be
damaged
or
degraded
for
the
reasons
that
I
mentioned
before.
So
this
is
a
it's
a
very
new
form
and
it
has
been
there's
a
lot
of
experimentation
going
on
with
other
distributed
systems
such
as
ethereum
swarm
as
well.
A
So
I
just
wanted
to
bring
that
over
to
the
ipfs
world,
so
you
know,
I
think
the
modularity,
the
high
level
of
modularity,
that
we
have
with
ipfs
recovery
also
helps
for
us
to
plug
in
any
erasure
codes
that
in
the
future,
could
be
even
more
crazy
right.
A
So
of
course,
the
good
folks
over
at
ipfs
created
test
ground
and
it's
we
want
to
kind
of
battle
test
our
implementation
of
recovery,
both
with
solomon,
alpha,
entanglement
and
all
of
them,
and
see
how
get
an
exact
metric
for
how
much
degradation
we
are
able
to
degradation
of
data
and
also
of
the
nodes
on
the
network
that
we're
able
to
resist-
and
that's
that's,
going
to
be
a
key
factor
of
determining
how
how
this
can
be
integrated
with
the
core
and
then
you
know,
potentially
deployed
in
the
mainnet.
A
B
So
there
are
like
concepts
that
can
be
used
online.
That's
our
kind
of
interface
level.
That's
should
be
covered
here.
The
first
is
recoverability,
so
this
is
kind
of
thing
that
allows
you
to
to
like
it's
it's
a
parameter
for
the
data
for
this
encoder
data
that
defines
like
percentage.
You
want
for
your
data
to
be
safely
lost.
B
Look
it's
recoverability.
If
you
lost
the
percent
you
put
in
regular
ability
of
the
data,
you
will
still
be
able
to
retrieve
it
from
the
network.
So
let's
say
I
want
a
recoverability.
I
have
a
dac
already
in
the
network
and
I
do
encode
on
it.
Putting
regular
ability
25
and
what
it
does
it
generates.
25
percent,
like
adds
more
notes
to
this
whole
deck
and
if
you
lose
25
actually
from
this
deck,
you
originally
have
on
a
network.
B
You'll
still
be
able
to
read
the
whole
data
from
from
it
but
like,
if
really
like,
we
can
tell
just
add
more
data,
add
more
25,
more
data,
but
in
terms
of
distributed
systems.
I
think
that
this
overhead
is
is
is
helpful
and
it
will
allows
the
or
content
to
be
better
retrievable.
B
That's
kind
of
this
thing
and
for
reit
solomon,
this
recoverability
works
not
with
percent
it
works
just
like
we
have
a
parameter
that
defines
the
amount
of
node
of
nodes
you
want
to
generate
for
all
the
dog
layers.
B
If
you
can
see
on
the
on
the
picture
we
have
here.
So
if
we
have
recoverability
2
for
the
root
block,
we
have
that
has
two
nodes.
We
recover
ability
to
generate
two
more
or
to
in
case
two
and
three
are
lost.
B
A
A
At
each
level
that
is
at
each
parent,
node,
you,
you
have
the
number
that's
specified
by
your
recoverability
that
many
redundancies
are
created.
So
at
this
level
you
can
see
that
a
and
b
are
created
and
connected
to
once
in
such
a
way
that,
if,
if
you
lose
any
two
of
these
four
nodes,
you
can
use
the
rest
to
recover
all
of
them,
and
the
same
applies
for
two
and
all
its
children,
five
and
all
its
children,
any
parent
know
that
has
all
its
children.
So
this
again,
this
is
reid.
A
Solomon.
If
you
have
alpha
entanglements
in
our
in
our
github
in
one
of
the
issues,
you
can
actually
see
a
more
specified
diagram
for
that
kind
of
use
case.
So
there's
this
there.
It's
there's
different
ways
to
play
around
with
the
dag
structure.
Thanks
to
you
know
the
ipld
api,
which
is
allows
for
this
kind
of
thing
to
be
done
really
well
and
a
little
bit
about
strategies
as
well.
B
Get
the
network
from
the
get
the
data
from
the
ipfs
you
and
you
fall
back
to
recovery.
This
might
will
lead
to
additional
data
on
your
nodes
that
you
haven't
asked.
For
I
mean
that's.
If
I
I
need
some
contents
and
to
get
that
content,
I
I
need
another
content
and
this
another
content
is
able
on
the
network.
I
got
it,
I
it
helped
me
to
recover
the
data
I
needed,
but
I
haven't
asked
for
that
data
to
recover.
B
So
this
strategy
is
this
kind
of
thing
that
might
be
like
it's
better
to
put
kind
of
ipfs
config
that
will
do
for
you
like
help
you
to
decide
what
to
do
with
the
data.
This
additional
data
you
need
for
recovery,
yeah.
A
B
First
is
all
that
means
that
you
will
help
the
network
and
you
will
store
everything.
Just
data
means
that
you
will
store.
Only
data
data
means
only
data
means
that
you
will
be
able
to.
When
you
do
recovery
you
can
of
the
file
this
file.
Might
you
can
get
just
part
of
the
file?
You
don't
need
the
whole
file,
but
this
part
of
the
file
has
some
like
lost
blocks
and
you
need
what
you
do.
B
You
fall
back
to
other
date
and
blocks
for
you
to
recover
original
blogs,
but
if
you
have,
if
the
user
haven't
asked
for
them
like
in
case
of
a
third
auction
that
wouldn't
store
the
this,
those
block
wouldn't
be
stored.
But
if
you
choose
just
data
in
in
this
case,
I
described
you
the
data
you
haven't
asked
for,
but.
A
B
Still
fetch
it
from
network,
or
you
also
may
regenerate
it
with
other
blogs
you
ask
for
in
just
data,
you
will
save
them
locally.
Maybe
you
need
the
future
or
you
will
help
to
network,
and
you
will
actually
provide
it
to
the
network.
In
case
you
requested
that
you'll
just
do
what
you'll
save
only
things
you
need.
Not
the
thing
help
you
to
get
what
you
need.
A
Yeah
in
the
interest
of
time,
because
I
think
we're
kind
of
running
low,
there
are
some
use
cases
where
you
know,
maybe
more
vital
use
cases
where
I
think
storing
data
and
redundancies
are
would
be
important,
even
if
it
means
paying
a
little
bit
more.
You
know
having
the
bearing
a
little
bit
more
cost,
and
then
there
are
some
cases,
like
probably
more
average
cases
where
you
just
want
to
get
the
data
and
keep
the
data.
A
So
we
want
to
have
these
different
kinds
of
strategies
as
well,
and
I
think
this
the
future
really
holds
for
any
community
discussions.
We
want
to
be
able
to
talk
with
the
community,
see
how
I
mean
we
have
a
lot
of
ideas.
Clearly
right.
We
want
to
see
what
kind
of
ideas
you
guys
have
and
how
we
can
implement
this,
both
for.
B
Here
about
a
second
version
of
ipld,
I
haven't
got
enough
time
to
actually
dive
into
new
specs.
They
like
that
protocol
labs
works
on.
I
know
that
there
is
a
lot
of
new
cool
features
that
might
be
used
for
recovery
as
well,
but
I'm
not
sure
what
exact
features
we
can
use
it
for,
but
recovery,
as
I
said
already
for
cid
part,
that
recovery
is
kind
of
it's
it's
a
feature
for
this
content.
B
It's
not
an
actual
way
to
represent
this
content,
so
adding
like
additional
more
something
to
to
cd
that
will
also
be
put
in
ipld.
I
think
it's
kind
of
connected
things
kind
of
this
also.
The
second
point
defining
recovery
notes
over
adhd,
going
already
told
about
this
in
terms
of
alpha
entanglements,
but
there
is
also
interesting
idea.
B
I
described
you
soft
links
that
are
used
for,
for
you
can
locally
find
what
notes
you
need
to
recover
the
data
you
asked
for,
and
you
will
clearly
store
this
in
that
data
store.
But
what
if
we
put
this
on
dht,
that
will
allows
us
to
create
kind
of
soft
links
on
the
network,
so
this
would
be
a
kind
of
last
resort
for
you
to
restore
for
to
recover
the
data
like
actually
ipfs.
B
Currently,
when
you
try
to
get
something-
and
it
will
just
endlessly
wait
for
that
for
the
data
it
can
find
what
if
we
can
do,
asynchronously
other
things
to
to
find
other
other
things
on
the
network
and
kind
of
soft
links.
So
you
can
match
that
this
can
recover
that
because
they
someone
connected
them
and
there
is
a
motivation
for
for
people
for
users
to
do
interconnected
recovery
nodes.
B
So,
for
example,
you
want
your
data
to
help
other
person
other
other
data
to
recover,
so
you
can
connect
and
make
those
like
helpful
notes
that
will
connect
different
parts
of
the
different
decks
and
will
help
from
the
from
these
two
decks.
You'll
create
some
redundant
notes
more
and
they
that
way.
If
you
can
help
different
data,
help
each
other
to.
B
To
recover
them,
even
though
I
can
there
is
an
interesting
example:
when
you
want
your
content
to
be
more
distributed
over
the
network,
you
can
connect
it
with
demanding
content.
You
can
help
demand
the
content
to
recover.
B
So
that
way,
your
content
will
be
much
more
like
distributed
for
some
peers
who
needed
a
recovery
for
a
demanding
content.
So
there
is,
there
is
a
lot
of
future
new
things.