►
Description
Paolo will walk us through Elastic IPFS.
A
First
of
all,
let
me
briefly
introduce
myself:
I
am
Paolo
I'm,
a
staff,
not
core
member
and
the
ex-team
engineer
at
near
Farm.
You
can
find
me
on
Twitter
and
GitHub
at
those
handles
and
the
little
tiny
blue
dot.
On
the
right
hand,
side
is
Where,
I,
Come,
From,
Within
Italy
for
red.
The
record.
The
rest
of
Italy
does
not
acknowledge
the
existence
of
my
region.
A
They're
gonna
regret
it
somehow,
but
that's
fine.
Anyway,
let's
get
started.
You
all
know
about
ipfs.
What
is
about
is
a
protocol.
I
mean
a
set
of
protographers.
Actually
they
signed
The
Preserve
and
grow
Humanities
knowledge
and
women.
Making
the
web
upgradable
resilient
and
more
open
on
Top
Pop,
That
webtrade
of
storage
was
built
and
is
a
service
that
makes
storing
files
on
ipfs
very
simple.
Even
for
non-technical
people
mix
magical
web
UI,
you
saw
it
and
you
retrieve
it
when
you
want.
That's
fine
pretty
easy.
A
Now,
since
we
were
contacting
something
was
going
wrong
right,
of
course,
and
there
were
challenges
in
the
original
architecture
or
web3
storage
and
they
had
the
what
I
call
the
most
wonderful
problem
that
a
company
can
experience
is
just
that
we
cannot
handle
our
growth.
The
previous
architecture
is
not
able
to
handle
the
growth
that
we
are
seeing
on
the
specific
case.
A
The
previous
architecture
could
not
handle
the
amount
of
new
uploads
per
day,
and
the
biggest
issue
was
that
trying
to
add
the
new
nodes
to
the
system
was
very
expensive
and
not
effective,
because
it
took
literally
two
days:
I
mean
48
hours
to
a
node,
to
add
it
to
the
system
due
to
the
DHT
bootstrap
phase,
basically
in
shaft
that
was
the
biggest
challenge.
A
That
was
the
problem
and
then
so.
The
protocol
Labs
people
reach
out
to
us
for
a
very
simple
question:
simple
simple:
how
we
can
use
cloud
services
to
make
the
service
horizontally
scalable
with
no
limits,
I
mean
that's
a
pretty.
You
know
little
requirement
not
hard
at
all.
You
know
just
the
bare
minimum
requirement
right,
I.
Actually,
unfortunately,
yes,
that
was
what
they
asked
us.
A
So
there
was
the
goals
that
we
immediately
established.
First
of
all
for
for
most
and
obviously
we
need
to
end
the
growth.
That's
that's
the
first
one,
so
the
second
one
was
to
be
cloud-based
because
of
course,
if
you
want
to
scale
with
no
limits,
the
cloud
is
your
only
option
because
otherwise
it's
hard
to
add
the
dynamically
new
notes
when
there
is
burst
or
whatever
so
the
solute
scale
pretty
fast
and
the
last
one,
which
is
something
we
establish
in
order
to
have
a
simple
and
lean
architecture.
We
went
stateless.
A
A
If
you
want
to
have
a
close
look
forward
is
very
interesting
to
know
is
that
the
architecture
is
divided
in
three
different
and
completely
independent
subsystems,
so
they
can
work
in
isolation
each
other
and
the
architecture
has
been
designed
with
replication
in
mind,
so
it
can
be
replicated
on
many
known
regions
and
so
forth
and
that,
as
I
told
earlier,
you
can
add
and
remove
nodes
at
any
time
without
any
penalty.
A
Now
one
thing
that
I
want
to
clarify
if
before
going
on,
is
that
what
I'm
gonna
tell
her
right
now
for
the
rest
of
my
presentation
is
focused
on
AWS,
but
I
want
to
make
clear
that
elastic
ipfs
is
not
an
architecture
based
on
AWS.
It's
a
cloud-based
architecture,
which
means
that
you
can
easily
replicate
on,
let's
say
Google,
Cloud,
Azure
or
even
on-premises.
In
short,
what
you
need
is
a
Computing
system
like
Lambda,
so
serverless
computing
share,
database
storage,
object,
storage
system
and
IQ
system.
That's
all
we
use
now.
A
Aws
is
just
a
reference.
Implementation
for
now
like
it
was
let's
say
at
the
beginning-
was
scuba
for
ipfs
there
was
the
go
ipfs
and
JS
ipfs.
They
were
just
reference
implementation.
We
are
not
logged
in
AWS,
we
could
technically
re
deploy
somewhere
else.
Of
course
we
have
to
adapt
the
code.
The
part
of
the
codes,
but
is
very
little
part.
So
in
theory
is
cloud
agnostic
in
architecture,
and
we
have
a
reference
implementation
on
AWS.
So
that's
we
are
not
like.
A
We
are
not
basically
want
to
pay
money
to
Amazon
or
Google
or
other
is
up
to
you.
That's
the
idea.
The
reason
why
we
initially
chose
AWS,
because
some
parts
of
protocol
Labs
were
already
on
AWS,
so
the
witch
chose
to
stay
on
the
same
Cloud
for
now
just
was
just
a
confidential
choice.
A
Now,
first
of
all,
I
would
say
that
we
are
using
a
share
database
and,
of
course,
we
put
data
inside
given
that
you
all
know
what
a
car
file
is,
which
is
a
Content
addressable
resource
and
basically
a
stream
optimized
file
format
for
storing
blocks
in
ipfs.
A
Then,
of
course
we
store
information
about
the
blocks.
As
you
can
see.
In
this
case,
we
don't
really
care
about
the
blocks
because
we
don't
process
them
at
all.
We
just
store
the
information,
so
we
store
basically
the
multiage,
the
creation
diet
when
we
actually
index
it.
This
block
first
Index
this
block
and
the
type
that's
it
pretty
pretty
simple.
A
So,
basically,
if
each
block
has
a
relationship
of
one
to
many,
because
it
might
be
included
in
many
car
files,
so
for
each
of
them
we
store
the
block
multiage
the
car
file
path,
the
length
and
the
offset
remember
these
two
last
two
information,
because
they
will
come
back
later
when
we
talk
about
how
we
serve
this
data
back
to
the
ipfs
network,
so,
let's
analyze
the
very
first
subsystem
in
in
the
elastic
ipfs,
which
is
the
indexing
subsystem
once
again.
This
is
the
diagram
overview,
but
we
don't
we
don't
care
about
it.
A
This
is
the
flow.
That's
what
we
care
about.
That's
pretty
pretty
simple.
At
the
end
of
the
day,
summarizing
very
a
lot.
The
indexing
flow
is
just
about
opening
an
S3
file.
Reading
is
in
a
sequential
way
and
for
each
block
you,
you
see
the
block
information
inside
the
Dynamo
and
you
also
enqueue
the
Block
multiage
in
sqs.
A
A
Initially,
the
Lambda
indexing
Lambda
was
either
impotent,
which
means
that
if
you
actually
execute
several
times
with
the
same
input,
you
will
get
the
same
output
right
initially.
We
did
we
designed
like
that,
because
we
were
terrified
of
car
files
with
millions
of
blocks
inside
and
we
say
if
something
goes
wrong.
We
don't
want
to
start
from
the
beginning,
especially
because
Lambda
have
a
15
minutes
execution
time
maximum.
So
if
the
file
is
too
big,
we
might
not
end
enough
fast
enough
right.
A
So
we
say:
okay,
let's
make
it
either,
but
then
let's
see
what
happens,
what
happened
is
that
in
order
to
be
the
impotent,
you
have
continuously
read
and
write
status
to
to
Dynamo
in
order
to
track
the
progress
of
the
Lambda,
and
this
was
absolutely
killing
performance.
So
we
have
to
give
up
on
in
the
impotence,
remove
it
and
in
order
to
basically
handle
the
the
the
upload
rate
fun
fact.
We
realized
that
it
impotence
was
completely
useless
because
the
car
file
format
is
so
well
designed
that
we
hardly
have
any
failure.
A
We
are
close
to
no
failure
rate,
so
we
can
directly
go
through,
publish
and
that's
it.
We
don't
have
to
do
anything
else,
so
either
impotence
was
completely
useless,
we
dropped
them,
and
now
the
indexing
Lambda
is
our
right.
Only
Lambda,
which
is
able
to
execute
very
fast
and
index
very
fast,
especially
because
after
we
drop,
read
and
write
we'd,
also
Embrace
dynamodb
batching
techniques.
We
could
go
25
times
faster,
so
which
was
massive
performance
gain.
A
Is
killing
me?
Okay,
yes,
now,
one
very
crucial
component
of
the
ipfs
is
the
Academia
DHT,
which
is
used
to
keep
shared
information
about
content,
discoverability
and
peer
discoverability
right.
Unfortunately,
the
DHT.
The
way
it's
designed
is
a
huge
pain
for
the
elastic
ipfs
for
two
reasons:
basically,
first
of
all
elastic
ipfs,
despite
being
a
cluster
within
AWS
or
any
cloud
provider,
for
what
matters
claims
to
be
to
the
outside
world,
a
single
node,
so
we
only
have
one
period
for
the
entire
cluster.
A
Second,
since
we
are
on
the
cloud,
we
cannot
maintain
long-living
connection.
If,
when
we
can
they're
expensive,
when
we
cannot,
we
simply
cannot,
by
definition,
so
it's
impossible
to
participate
to
the
gossip
sub,
which
is
used
by
the
Academia
Network,
which
led
to
the
consequence
that
the
on
the
DHT
side,
the
elastic
ipfs
system,
was
not
self-sufficient
in
order
to
provide
the
entire
experience,
and
then
we
have
to
rely
on
other
Technologies
created
by
protocol.
A
Labs
people
like
the
network
indexer,
which
I
will
call
index,
are
not
in
the
for
within
the
future
and
either
nodes
created
by
protocol
labs
for
other
purposes.
So,
let's
talk
about
Hydra
nodes.
Actually,
these
other
nodes.
When
I
realize
how
they
work.
They
were
a
very
nice
piece
of
archaic
architecture
because
they
were
obviously
exploit
the
characteristic
of
the
DHT
and
they're,
basically
notes
that
are
only
put
there
in
order
to
make
sure
that
whenever
you
make
us
research
on
the
Academia
Network
very
quickly,
you
run
in
one
of
The
Idler
node.
A
That's
why
that's
why
the
name
right
is
a
single
entity
with
several
ads.
Basically,
that's
the
idea
right.
So
when
you
start
searching,
you
eat
one
of
the
Hydra
nodes.
All
the
Hydra
nodes
share
the
same
database
as
basically
the
shade
storage,
so
they
can
cache
all
the
searches
made
on
the
DHT
and
eventually
they
can
also
access
third-party
systems
like
the
indexer
node
to
fetch
information
for
for
Content
that
they've
been
have
not
experienced
so
far
without
relying
on
the
DHT
itself.
A
Now.
One
thing
that
I
want
to
clarify
is
that
when
I
say
that
either
nodes
are
everywhere
in
the
network,
I'm
not
saying
that
they
are
physically
present
in
the
network,
I
mean
there.
There
is
always
Hydra
node
physically
close
to
you,
but
you
have
to
remember
that
I'm
talking
about
a
neighbor
of
neighbor
concept
within
the
Academia
DHT,
which
in
other
words,
means
that
all
the
nodes
in
in
the
Academia
Network
are
assigned
to
a
bucket.
A
All
the
nodes
in
that
bucket
are
considered
to
be
enabled
so
either
nodes
are
created
in
order
to
be
present
in
each
possible
bucket
of
the
Academia
Network,
and
that's
that's
how
the
the
magic
happens
and
that
I
was
astonished
when
I
understood
how
it
worked.
It
was
amazing,
Simply
Amazing,
and
then
we
have
the
last
piece
which
is
missing
for
eipfs
in
order
to
work
properly,
which
is
which
are
the
indexer
node
or
the
network
index,
because
I
think
it
got
actually
renamed,
but
I
will
name
it
in
the
old
way.
A
Basically,
it's
a
system
that
is
capable
to
deterministically
map
immediately
a
CID
to
content
providers.
It
you
can
do
that
via
lib
peer-to-peer
or
via
a
plain
old,
HTTP.
Api.
Now
guess
what
we
are
using
for
what
I
spoke
earlier,
we
were
forced
to
use
the
HTTP
one
because,
as
I
said,
I
already
said
earlier,
eipfs
is
on
the
cloud
so
leap
year
topic,
gossip
sub
is
not
available.
A
A
All
of
them
they
expose
the
same
peer
ID
if
it
happens
that
multiple
sources
connect
to
the
same
destination
most
by
optimization
of
the
lib
P2P,
the
destination
of
will
assume
that
one
of
the
connection
is
wrong
because
probably
the
cost,
basically,
it
is
used
to
reconnect
the
source
from
another
LSA
Wi-Fi
network
or
whatever,
and
we'll
drop
one
of
the
connection.
So
this
makes
the
communication
impossible
when
it
comes
from
different
sources.
That's
why
we
could
not
use
them
and
we
chose
for
the
HTTP
API
now.
A
The
another
interesting
part
about
the
indexer
node
is
that
the
it
reverses
the
control
of
downloading
the
data.
Basically,
you
don't
directly
upload
data
to
the
indexer
node,
but
rather
you
upload
the
new
advertisement
and
entry
data
somewhere,
and
they
must
be
forever
available
on
HTTP
on
that
destination.
A
You
have
to
make
sure
that
this
advertisements
are
strictly
ordered
in
a
blockchain
manner.
So
you
update
the
head
of
the
queue
and
then
you
link
each
advertisement
to
the
previous
one
back
to
the
tail
of
the
queue
which
is
the
very
first
advertisement
and
then
when
you're
done
and
you
actually
pretty
confident
that
the
indexer
node
can
download
the
new
data
you
make
a
simple
HTTP
put
request
to
to
that
slash,
India
slash,
announced
endpoint
and
then
that's
it
at
some
point:
the
indexer
node.
A
According
to
his
own
schedule,
we
downloaded
new
data,
which
remember
is
served
with
two
plain
http.
In
our
case,
we
leverage
this
because
we
don't
have
a
specific
HTTP
server
to
serve
this
advertisement
to
the
indexer
node
but
Twitter.
We
basically
use
a
public
S3
bucket
and
we
let
Amazon
make
the
job
for
of
HTTP
hosting
for
free.
So
we
don't
even
have
to
manage
that
part.
So
we
could
basically
remove
one
component.
A
A
So,
as
I
said,
when
the
index
error
node
is
ready
to
download
the
new
data,
it
will
basically
connect
to
this
HTTP
server
start
downloading
the
head
of
the
queue
then
iteratively
fetch
the
entries
and
the
previous
advertisement
up
to
the
point
that
either
it
reaches
the
an
advertisement
that
theory
already
processed
or
the
tale
of
the
queue
which
is
the
once
again
the
end
of
the
queue,
and
that
brings
us
to
sample
file
which
I'm
not
sure.
If
you
can
actually
yeah,
you
can
actually
kind
of
read
them.
So
this
is
the
sample.
A
Add
of
the
queue.
If
you
look
at
line,
three
is
a
link
to
the
current
advertisement.
Bear
with
me
is
what
is
written
there.
I
will
share
the
slides
online,
so
you
can
double
check,
but
that's
what
it
says
then,
in
the
advertisement
file.
Instead,
you
can
see
on
the
line.
Two.
There
is
the
address
multi-address
of
the
content
provider
and
on
line
seven
there
is
the
link
to
the
entries
file.
Remember
that
when
I
say,
link
I
literally
mean
one
part
of
an
HTTP
server.
A
A
A
But
actually,
when
we
designed
this
system,
we
run
into
another
problem,
which
is
the
concurrency
problem,
and
you,
the
data
amount
load
problem,
because
ipfs
and
web3
storage
go
on
a
steady
upload
rate
per
day
around
the
order
of
millions.
Now,
if
we
make
a
rough
estimate
of
having
a
thousand
blocks
per
car
file,
that
leads
us
to
billions
of
blocks
uploaded
per
day,
which
is
horrible
and
not
really
sustainable.
A
If
you
go
one
by
one,
moreover,
since
you
have
to
provide
this
file
in
a
strictly
ordered
fashion,
if
we
go
back
one
by
one,
we
cannot
really
make
one
billion
uploads
in
a
second
without
any
computers,
this
will
kill
us.
Finally,
if
we
introduce
concurrency
without
carefully
thinking,
what
are
the
implications,
we
might
lose
an
entire
branch
of
updates
if
two
concurrent
lambdas
are
updating
the
head
of
the
same
time,
because
there's
a
risk
condition
on
the
ad,
and
we
cannot
do
that
now.
What?
What
was
the
solution
for
that?
A
Well,
as
we
do
at
linear
form,
which
was
not
to
panic,
we
we
sat
down,
we
say:
okay,
there
must
be
a
solution
right.
There
was
a
solution.
The
solution
was
the
usual
DVD
at
the
Imperial
approach.
In
other
words,
we
try
to
reduce
the
size
of
the
problem.
What
was
the
solution
very
easily?
We
chose
to
use
two
lambdas
in
a
multi-stage
process.
A
A
The
important
part
is
that
well,
when
the
while,
the
group
in
Lambda
has
no
concurrency
limit,
the
publishing
Lambda
has
a
strict
limit
of
countries
of
one,
so
it
can
only
execute
once,
but
when
he
executes
once
given
the
way
we
have
grouped
the
data,
we're
actually
publishing
10
000
advertisement.
At
the
same
time,
that's
the
trick.
A
Now,
if
you
make
some
rough
calculation,
you
can
quickly
get
to
the
point
that
1
billion
blocks,
grouped
by
10
000,
make
to
a
hundred
thousand
advertisement
per
day
to
publish
if
you've
divided
by
the
order
of
seconds
which
is
86
400
makes
that
you
have
just
to
make
one
call
to
indexer
note
per
second:
that's
it
so
very
quickly,
so
we
can
probably
even
more
scale
on
this
one,
because
we
can
go
much
faster
than
that
if
we
basically
max
out
the
performance
of
the
lambdas
and
so
forth,
but
so
far
we
can
easily
easily
easily
publish
a
billion
block
per
day
with
no
problem
at
all.
A
Last
once
now
bear
with
me,
we
index
the
data
we
publish
to
the
ipfs
we
are
now
we
are
now
we
have
to
actually
serve
the
data
right.
That
brings
to
the
last
subsystem,
which
is
the
peer
subsystem
once
again.
This
is
the
overview.
Just
let's
skip
it
now.
What
are
the
the
characteristic
of
the
pre-sub
system,
which
is
delegated
to
be
contacted
by
a
lot
of
people
from
all
around
the
world?
A
Well,
the
trick
was
to
have
a
fully
automatic
eks
cluster,
with
so
managed
by
on
front
by
an
elastic
load
balancer,
which
basically
makes
the
balancing
for
the
websocket
connection.
That's
it.
That
was
the
trick.
A
Second,
second
part
of
the
trick
was
the
bit
swap,
is
a
read-only
system
and
therefore
is
stateless,
which
means
that
you
can,
without
any
penalty
scale
up
and
scale
down
notes,
as
you
want,
and
also
another
thing
is
that
they
are
very
lean,
because
the
trick
is
that
when
you
ask
for
for
us
for
a
block,
we
check
on
Dynamo.
If
we
have
the
block,
if
we
don't
have
it,
we
directly
respond,
we
don't
have
it.
We
don't
try
to
first
to
fetch
it
on
the
external
network.
A
If
we
have
it,
we
immediately
serve
it
by
contacting
S3,
making
an
HTTP,
not
I
mean
we're
not
using
the
AWS
SDK
we're
actually
eating
the
the
class,
the
bucket
via
HTTP
plain
HTTP.
We
Leverage
The
byte
range
of
HTTP.
So
that's
why
we
care
about
the
offset
and
the
length,
so
we
can
immediately
spot
the
three
bytes
in
the
file
that
we
need
and
we
serve
over
memory
to
the
client
without
even
storing
in
our
file
system
for
the
Lambda.
A
Moreover,
the
second
trick
was
that
the
peer
subsystem
is
a
bit
swap
client,
but
is
a
simplified
one.
Once
again,
since
I've
said
that
we
are
read
only
and
we
don't
fetch
external
data,
we
could
remove
much
the
more
complex
part
of
the
beastwork
protocol,
like
the
ledgers,
the
ones
list
and
so
forth.
So
we
received
a
list
of
immediately
process
respond
and
that's
it.
We
don't
do
anything
else,
very
simple,
so
that's
why
we
were
able
to
basically
scale
up
and
down
very
quickly.
A
Now
you
all
have
this
question.
How
does
eipfs
perform?
Well?
My
good
friend
Alan
already
show
you
this
slide,
which
is
about
talking
about
200
terabytes
of
data.
I
challenge
you
to
spot
when
actually
eipfs
was
I
was
deployed
in
that
graph.
You
can
easily
you
can
see
that
right,
so
we
went
from
zero
to
I,
mean
close
to
zero
to
close
to
100
it
rate
for
the
systems
in
the
in
bit
swap,
but
there
is
also
a
second
slide
this
one.
This
is
the
average
indexing
time
on
the
right
hand.
A
Side
of
the
graph
is
not
that
there
is
no
sample
data
to
show
anymore.
It's
just
that
the
left
hand
side
is
so
big
that
the
right
hand
side
where
the
performance
has
improved,
pretty
much
disappear
in
the
graph,
because
we
went
from
an
average
indexing
type
of
three
minutes
to
an
average
indexing
time
of
few
milliseconds.
A
And
finally,
if
you
want
the
row
numbers
after
six
months
in
production,
we
have
close
to
100
million
car
files,
16
billion
blocks
and
25
billion
blocks
to
cars,
links
where
we
are
basically
serving
the
performance
that
we
have
seen
before
now.
A
What
we
can
learn
from
this
journey
right,
so
I
told
you
that
that
was
related
to
love
right
and
that's
one
of
the
nice
acronyms
in
the
Unix
world
at
hello,
which
is
the
keys
rule
for
who
the
donor,
that
acronym
is
keep
it
simple
stupid,
which
means
that,
would
you
have
a
lot
of
very
nice
and
complex
protocol?
If
you
try
to
completely
Embrace
Implement
and
replicate
all
of
them,
you
just
go
crazy.
That's
it
there's!
No
way
you
can
do
that.
A
Second
thing:
HTTP
is
not
that
and
will
never
be
probably
I
mean
I'm
biased,
because
I
manage
the
HTTP
stack
in
node.js,
so
I'm
I'm
in
the
node
core,
so
I'm
biased.
But
actually
the
thing
is
that
new
Protocols
are
fashionable
and
they're
very
nice
like
bit
swap
Works
awesomely
fine,
but
when
you
want
to
stay
simple,
fast
and
performant,
often,
if
HTTP
is
your
choice,
it
still
works
fine
after
I,
don't
know
20
30
years
now,
right
I
mean
I'm
getting
old
but
anyway.
A
Finally,
democracy
is
good,
as
you
remember,
I
told
you
that
this
is
a
cloud-based
architecture
which
happens
to
be
on
AWS
at
the
moment
temporarily,
because
we
are
going
to
be
Cloud
agnostic,
as
Alan
said
now
to
us,
was
also
a
good,
a
good
thing,
knowing
that
in
the
future
we
would
have
been
on
another
cloud
or
even
on
mixed
Cloud
approach,
because
it's
perfectly
legit.
Why
is
that?
A
One
last
thing,
which
is
also
something
for
people
like
me
that,
like
to
over
engineer
stuff,
remember
to
keep
your
eyes
to
the
Stars,
so
keep
dreaming,
but
your
feet
on
the
ground.
So
don't
lose
the
Practical
approach,
otherwise
you
will
soon
get
lost.
This
was
from
Theodore
rules
Roosevelt,
so
we
are
talking
about
80
years
ago,
but
it's
still
up
to
date,
so
it
can
still
apply
it
every
day
and
it
will
be
in
the
future.
A
Very
fastly
I
want
to
take
near
from
for
sending
me
here.
We
are
a
global,
consulting
company,
deeply
in
the
JavaScript.
So
if
you
happen
to
run
on
npm
eight
eight
percent
of
the
time
you
were
using
our
stuff,
you
can
can't
escape
I'm.
Sorry,
and
even
though
we
are
more
than
three
other
people
now
we
are
always
looking
for
new
talents.
So
if
you
want
to
engage,
you
can
reach
out
to
me
to
Cody
and
Matt
which
are
outside
and
that's
it
do
you
have
any
question
for
me:
auditions.