►
From YouTube: 5 Billion Blocks - Alan Shaw
Description
Alan's talk will give a high level overview of infrastructure the DAG House team has built for serving massive amounts of IPFS content to thousands of users around the world.
A
Let's
talk
about
ipfest
at
scale,
this
talk
is
called
five
billion
blocks
for
most
definitions
of
billion,
because
that's
approximately
how
many
blocks
that
we
store
right
now
for
for
our
users
and
that's
roughly
nearly
a
petabyte
of
data
on
ipfs
and
filecoin
and
that's
across
a
hundred
nearly
a
hundred
thousand
user
registrations.
So
that's
that's
quite
rad
and
those
users
have
made
over
or
nearly
100
I've
rounded,
these
all
up
to
make
it
fun.
A
A
We
built
nft.storage
in
two
weeks
for
this
hackathon
called
nft
hack,
I,
think
it
was
that
hackathon
I'm,
not
sure,
but
anyway,
if
Global
was
putting
that
on,
and
we
built
this
thing
for
it
and
at
the
time
the
idea
was
just
to
create
the
easiest
way
for
developers
to
onboard
data
into
vilecoin
and
that
turned
out
to
be
pretty
popular.
A
Idevice
is
a
really
good
place
for
that
for
nfts
for
reasons
we
can
talk
about
after
this
talk,
if
you
like
so
anyway,
we
started
with
ibfest
cluster,
so
cluster
cluster
got
us
a
long
long
way
it
it
stored
over
25
million
pins
for
us
it
still
stores
pins
for
us
at
the
moment
and
at
the
time,
what
else
were
we
going
to
do?
This
was
like
two
years
ago,
like
whatever
ipfs
implementations.
A
Do
you
reach
for
and
what
we
wanted
was
like
a
multi-tenant
system
with
like
redundancy
putting
data
onto
multiple
nodes
so
that
they
weren't
like
we
didn't
lose
it
because
we
didn't
want
to
do
that.
That
was
a
big
thing
about
nfts
was
that
people
were
putting
stuff
on
on
ipfs,
assuming
it
was
like
forever
storage,
not
realizing
exactly
that.
A
If
you
don't
keep
your
node
running
or
put
it
on
somewhere
and
on
a
node
that
does
keep
running,
then
it's
not
going
to
continue
to
be
on
the
internet
anyway,
ibfs
cluster,
an
amazing
product,
and
we
made
good
big
use
of
it
and
it
got
us
so
so
far,
but
it
wasn't
easy.
We
learned
the
hard
way
how
to
make
ipfs
cluster
scale
massively
and
like
small
things
like
a
data
store
choice
and
we
started
with
badgerdb,
because
I
think
I
think
that's
the
default.
A
That's
why
we
started
with
it
anyway,
and
that
was
fine
for
a
while
quite
a
long
while,
but
then,
eventually
we
had
to
switch
to
flat
effort
s
I,
don't
know
there
wasn't
an
emoji
for
flat
FS.
So
anyway,
that
gave
us
way
better
performance
characteristics,
so
that
was
fun.
We
also
started
with
a
cluster
of
just
a
few
nodes
with
really
big
disks,
and
then
we
had
to
switch
to
a
cluster
with
many
nodes
of
small
disks.
A
Basically,
if
you
let
ibfs
nodes
get
too
big,
then
you
find
that
performance
sort
of
tends
to
tail
off,
but
that
came
with
it
with
its
own
kind
of
challenges
like
just
managing
them.
All.
Like
upgrades
upgrades
for
ipfs
for
cluster
I,
remember,
we
had
like
a
block
store
upgrade
quite
recently
and
it
was
going
to
be
super
invasive
to
our
users,
because
we
can't
we
have
upload
okay.
A
So
we
have
about
like
five
to
ten
uploads
a
second,
so
we
can't
really
stop
the
cluster
and
and
do
a
data
store
migration.
A
So
actually
it
was
way
easier
for
us
to
just
create
a
new
cluster
and
then
copy
it
stuff
across
over
the
course
of
months
and
months
and
months,
but
eventually
got
there
yeah,
because
we
can't
stop
the
world
for
that
sort
of
thing
and
we're
also
relying
quite
heavily
on
the
pl
netops
team
for
because
for
the
we
built
it
in
two
weeks
with
attack
of
one
we
were
like
now
tops.
A
Can
you
spin
us
up
a
class
they're,
like
yeah
sure,
no
problem,
and
then
you
know
that
cluster
turned
into
a
you
know
a
cluster
of?
Maybe
three
nodes
I
think
to
a
cluster
of
about
50
nodes
and
then
netops
were
on
pager
Duty
and
we
were
like
sorry.
This
has
taken
up
some
of
your
time,
so
well
got
a
bit
guilty
for
that.
But
anyway,
let's
talk
about
garbage
collection,
yeah!
No,
you
can't
garbage
collect.
A
We
can't
really
unpin
everything,
because,
as
a
multi-tenant
system,
we
can't
really
guarantee
that
someone
else
isn't
also
uploading
the
same
data
and
if
we
could
unpin,
then
it
would
just
take
forever
to
garbage
collect.
These
are
Big
notes,
lots
of
nodes
with
with
lots
of
data
on
them
and
it
and
we
could
take
them
out
of
rotation
but
and
then
garbage
collects
and
then
put
them
back
in.
A
But
that's
like
it's
a
real
like
manual
process
to
be
to
be
performing
on
your
live
production
infrastructure
and
it's
kind
of
yeah,
not
not
great,
a
bit
error
prone
bit,
scary
and
yeah.
So
our
solution
to
that
is
to
not
garbage,
collect
and,
and
then
just
like
it's
it's
just
busy.
A
It's
busy
busy
busy
all
the
time
like
we've
got
car
file,
uploads
and
pin
requests
coming
in
the
whole
time
right
right,
right,
right,
right
and
then
we
then
all
each
one
of
the
rights
like
provide,
provide,
provide
to
the
DHT
and
then
periodically.
We
have
like
reprovides,
which
try
and
re-provide
the
whole
of
all
of
the
blocks
that
this
particular
node
is
storing.
A
And
then
the
cue
of
that
is
so
long
that
that
you
know
the
the
the
the
provider
records
expire
before
they
even
get
onto
the
DHT
and
that's
fun,
and
then
obviously,
then
there's
reading
via
bit
Swap
and
and
that's
not
just
external
traffic.
That's
like
cluster
cluster,
actually
bit
squats
between
its
peers,
because
when
you
upload
stuff
to
it,
it
goes
on
to
one
of
them
and
then
to
get
that
replication.
A
It
bits
what's
between
them,
so
yeah
so
busy
working
busy
and
that's
why
it
gets
hot
and
tired
and
then
yeah,
especially
if
you
have
popular
content.
If
someone
uploads
something
that's
super
popular,
then
that
node
is
busy
forever
we're
observing
that
content,
yeah,
so
busy
busy
notes,
and
so
we
built
elastic
ipfs
to
help
sort
of
alleviate
some
of
these
issues.
A
I'm
not
going
to
talk
loads
in
depth
about
how
it
works,
because
there
is
actually
another
talk
a
little
bit
later.
That
goes
into
how
I've
elastic
ibfs
Works
in
depth,
but
I'm
going
to
Breeze
through
it
super
quick,
just
just
because
I
think
it's
really
interesting,
but
the
essentially
elastic
the
the
computers
that
are
like
accepting
data.
The
rights
are
not
the
same
computers
that
are
actually
reading
data
as
well.
We've
separated
those
two,
those
two
pipelines
and
elastic
IVF-
is
free
and
open
source
on
the
internet.
A
You
should
go
to
GitHub
and
check
it
out.
If
you
search
for
elastic
ibfs,
you
will
likely
find
it,
but
how
does
it
work?
Well,
we
accept
car
files,
which
are
serialized
tags,
both
nfts
storage
and
we
have
free
lot.
Storage
both
accept
car
uploads.
We
also
accept
files,
but
we
like
to
encourage
the
web
free
way
of
doing
things.
So
people
know
the
seids
before
they
send
them
to
us
and
anyway,
so
currently
they
go
into
like
workers
in
the
cloud.
A
That's
the
little
cogs
they
are
like
like
like
lambdas,
but
that
scales
up
really
nicely
so
that
they
can
accept
uploads.
Really,
it's
a
good
big
good
concurrency
and
the
workers
put
car
files
in
a
simple
storage
bucket,
so
we
put
car
files
in
the
bucket
and
we're
actually
moving
towards
a
system
where
the
workers
just
give
the
user
a
signed
URL
and
they
upload
directly
to
the
bucket.
So
we
don't
even
need
to
go
through
the
workers
anymore.
A
We
don't
avoid
that
that
problem
of,
like
proxying
the
content
and
the
cost
associated
with
that,
and
so
this
is
where
elastic
ibfs
comes
in,
because
it
gets
informed
that
there's
a
new
car
in
the
bucket
and
it
can
be
any
bucket-
doesn't
have
to
be
that
one
and
just
as
long
as
elastic
ivfs
can
actually
read
that
that
car
file,
then
it's
all
good,
so
it
gets
told
of
it
and
then
elastic
ipfs
indexes
the
blocks
in
it.
A
So
we
stored
a
block
cids
the
byte,
offset
within
the
car
files
and
also
the
car
file
that
we're
actually
in
so
we
know
where
to
look
for
it
and
so
elastic
hobbyfest.
Has
these
specialized
ipfs
nodes
that
run
in
there
and
they're
called
like
bit
swap
appears
because
that's
all
they
do,
they
just
do
bit
swab
and
it's
it's
made
up
of
an
auto
scaling,
load,
balanced
kubernetes
cluster,
and
so
when
nodes
in
the
ipfs
network
connect
to
elastic
ipfs
and
send
a
bit
swap
want
list
the
bits.
A
What
appears
consult
the
index
they
find
out
where,
where
those
blocks
are
like
which
car
file
and
what
offset
and
then
they
send
a
message
back
with
the
blocks
that
were
that
were
requested
and
and
it
does
that
by
making
range
requests
to
the
car
files
direct.
So
it
reads
directly
from
the
car
files
in
the
bucket
making
range
requests
to
serve
the
blocks,
and
that
is
it,
but
not
quite
it's.
A
It's
also
worth
mentioning
that
the
when
we
index
the
car
files
we
send
that
information
to
indexer
nodes-
and
this
is
how
all
five
billion
blocks
are
discoverable
on
the
DHT,
and
so
we
yeah
we
use
indexer
nodes
and
index
and
nodes,
are
purpose
built
to
map
cids
to
content
providers
so
and
that's
for
the
scale
of
the
file
coin.
Network.
A
Our
tiny
amount
of
data
in
comparison
to
the
capacity
of
the
Falcon
network
network
is
should
hopefully
they
should
be
able
to
handle
that
so
far
they
have
and
so
yeah.
Essentially,
you
can
ask
an
indexer
node,
who
has
a
CID
and
it
will
tell
you,
provided
someone
else
has
already
told
it
who
has
it
before
before
you
ask
them,
and
so
this
graph
this
is
my
second
favorite
graph,
is
the
reason
why
we
need
indexer
nodes.
This
is
one
of
our
nodes.
A
A
So
we've
got
a
disc
tool
called
checkup
which
basically
takes
a
CID
sample,
a
CID
that
we
know
is
stored
on
a
particular
node
and
then
asks
the
DHT
who
has
that
and
if,
if
it
finds
that
that
node
is
in
a
provided
record
in
the
DHT,
then
it
will
say
you
know
the
chart
will
be
better
and
if
it
doesn't,
then
in
chart
will
be
bad
like
this.
The
chart
only
goes
up
to
55.
A
55
of
the
time
is
the
maximum
amount
of
time,
and
this
is
way
below
that
for
most
of
the
time,
so
bad
Bad
News
Bears.
So
this
is
my
favorite
graph,
and
this
is
elastic
ipfs.
When
the
X
indexer
nodes
were
turned
on,
you
can
see.
We
had
basically
nothing
because
elastic
ivfs
doesn't
do
any
providing
to
the
DHT.
This
is
annoying,
and
so
any
any
spikes
here
is
literally
just
because
someone
else
happens
to
have
that
content
on
the
internet
I
think
no.
Actually
it's
not.
A
Oh
I,
don't
really
know
anyway,
it
doesn't
matter.
So
this
is
when
it
got
turned
on.
We
went
from
zero
to
100
and
it
basically
stayed
100
ever
since
then
so
really,
good
and
yeah
comparatively.
A
This
this
is
also.
This
is
one
node
in
a
cluster
of
50..
This
is
one-fiftieth
of
our
data.
This
is
all
of
our
data,
so
super
cool,
that's
elastic,
ipfs,
oh
yeah!
No!
This
is
a
new
thing
we
released
about
a
week
or
two
ago.
So
I
wanted
to
quickly
talk
about
this.
If
I've
got
any
time
left,
I,
don't
even
know
12
minutes.
Okay,
I've
got
some
time.
This
isn't
long.
So
this
the
freeway.
A
We
can
talk
after
about
the
name,
why
we
named
it
that,
but
it's
a
new
thing
that
we
have
called
freeway
and
it's
an
ipfs
Gateway,
that's
backed
by
car
files,
and
that's
why
I
put
tons
of
car
emojis
on
there
and
it's
why
I'm
concerned
about
the
Emoji
rendering
on
my
slides
so
yeah,
so
it's
an
ipfs
Gateway,
that's
backed
by
cars
and
it's
the
same
car
files
that
our
users
upload
to
our
service.
That's
right!
A
So,
let's
recap
on
gateways
using
the
best
graphic
for
gateways,
I
know
of
on
the
JS
ipfs
website.
So
a
Gateway
is
an
ipfs
node
and
it
provides
access
to
the
wider
ipfs
network
from
a
centralized
point
and
it's
essentially
a
HTTP
interface
to
ipfs.
So
HTTP
requests
come
in
asking
for
file
data
for
a
particular
CID
and
idfs
goes
and
finds
the
data
and
then
exports
a
regular
file
from
the
Unix
of
s
blocks
that
it
finds
in
the
ipfs
network,
and
so
in
the
case
of
elastic
ipfs.
A
It
kind
of
it
looks
like
something
like
this.
This
is
like
the
ipfs.io
Gateway,
you
ask
it.
The
HTTP
request
comes
in
so
they
give
me
this
this
deployer
for
the
CID.
It
does
some
bit
swapping
with
elastic
ipfs.
Meanwhile,
elastic
ipfs
is
doing
the
read,
read
the
blocks
from
the
bucket
thing
and
then
it
all
comes
back
so
all
the
way
through,
and
so
what
freeway
does?
A
Is
it
cuts
out
that
middleman
and
we're
so
we're
still
serving
ipfs
content
addressed
data,
but
we
don't
go
over
lib,
P2P
or
bit
swap
to
retrieve
it,
and
this
kind
of
thing
is
is
is
possible
because
we
actually
run
our
own
gateways.
We
run
nft,
storage.link
and
w3s
Dot
link
and
these
gateways
are
actually
special
gateways
and
they
they
race,
multiple
other
gateways
and
give
the
the
response
back
so
freeway
doesn't
need
the
discovery
element
that
ipfs
has.
A
If
there
are
other
gateways
in
the
race
that
do
have
that
Discovery
element,
and
so
freeway
can
literally
just
return
404
for
things
that
it
doesn't
have
yeah
and
but
but
the
the
fun
part
is
it's
quite
likely
to
have
the
things
that
people
are
requesting
through
those
gateways
because
they're
probably
uploaded
it
to
our
service,
so
they're
going
to
use
our
gateways
and
then
so
yeah
so
quite
often
serves
the
data.
A
That
is,
that
needs
to
be
served
and
it's
blazingly
vast
and
and
it's
great
and
yeah.
So
we
have
data
in
in
yeah,
so
there
we
go.
But
how
does
it
work?
Okay?
How
does
it
work
real,
quick,
real,
quick
I'm
over
time?
How
am
I
so
we
have
a
bucket
full
of
cars.
So
how
does
freeway
know
which
car
file
to
export
from
because
there's
blocks
in
the
cars
we
don't
know
which
car
the
block
is
in?
A
How
do
we
know
Dude
Where's,
My
Car,
the
dudeware
is
a
link
index
and
it
links
root
data
cids
to
the
car
files
where
that
dag
can
be
found.
So
we
consult
the
the
we
consult,
dudeware
and
dudeware
says
it's
in
these
cars
and
then
we
also
store
a
index
in
the
bucket,
and
this
tells
us
the
byte
offsets
of
the
blocks
that
are
in
that
particular
car
and
we
store
a
car
V2
index.
A
If
you
know,
if
you
know
who
carving
two
index
is
then
it's
a
multi-hash
short
sorted
index
I,
don't
know
why
I
put
that
in
the
slides.
If
you
know
you
know
anyway,
it's
car
V2
index,
it's
not
a
car
V2.
We
don't
actually
change
our
car
v1s
into
car
V2s.
We
just
store
like
side
indexes,
so
actually
we
can
inflate
to
a
car
V2.
A
If
we
want
to
just
read
the
first
one
read
the
second,
but
but
we
don't,
because
it's
actually
he's
better
to
believe
me,
it's
better
to
store
them
side
by
side
anyway,
so
we
have
indexes
next
to
the
class
in
the
bucket
and
yeah.
So
all
together.
The
flow
looks
like
this:
a
request
comes
in
for
a
particular
CID.
We
consult
Dude
Where's,
My
Car.
We
know
the
cars
that
it's
in.
A
A
So
read
the
car
V2
index
as
we
know
where
all
the
blocks
are,
and
at
that
point
we
have
all
the
information
we
need,
and
we
can
literally
just
do
a
Unix
FS
export
directly
from
the
bucket
using
byte
range
requests
to
extract
the
blocks
that
we
need
to
serve
in
the
order
that
we
need
them,
and
we
do
some
clever
things
like
batching
range
requests
when
the
blocks
are
close
together,
we'll
make
fewer
requests
by
reading
multiple
blocks.
With
one
request
there
you
go:
okay,
rad,
so
that's
very
cool.
So
who
are
we?
A
We
are
the
house
of
dagus
where
we
keep
the
dags
in
that's
where
they
live
in
the
house
and
our
Alter
Ego
is
dag
house,
which
is
our
German
Rave
night
on
Saturday,
and
this
is
our
dag
house
in
the
Merkel
Forest.
This
is
where
we
live,
so
thanks
very
much.