►
From YouTube: Web Scale IPFS - @mikeal - IPFS Implementations
Description
Web Scale IPFS - presented by @mikeal at IPFS bing 2022 - IPFS Implementations - https://2022.ipfs-thing.io
A
Hey
everybody
so
I'm
going
to
talk
a
little
bit
about
the
read
and
write
pipelines
in
web3
storage.
A
Web
3
storage
is
soon
the
platform
that
nft
storage
kind
of
rests
on
top
of,
and
we
you
know,
I
think
we
built
nft
storage
in
a
couple
weeks,
so
it
was
not
built
on,
like
you
know
the
the
most
sustainable
architecture
and
as
we've
continued
to
scale,
we've
found
that
you
know
we
can't
really
buy
the
scale
that
we
wanted.
So
we
had
to
go
and
build
it.
And
now
web3
storage
is
really
set
up
to
be.
A
Not
just
you
know
the
provider
for
nft
storage,
but
for
many
NFC
storage,
size
customers,
so
here
to
talk
a
lot
about
what
we've
had
to
do
at
scale.
So
a
little
bit
about
dag
house
we're
a
nucleating
entity
inside
of
protocol
Labs.
That
means
we're
in
protocol
labs
and
we're
kind
of
like
on
the
rails
to
become
an
independent
entity,
and
this
is
like
a
little
bit
weird
to
talk
about
in
an
audience
full
of
protocol
experts,
but
in
in
the
broader
ecosystem
of
search
providers.
A
It's
pretty
rare
to
have
this
much
protocol
expertise
inside
of
one
team
kind
of
building,
so
that's
really
sort
of
shaped
what
we've
been
able
to
do
as
a
team
and
soon
as
an
independent
company.
A
We
also
have
like
a
lot
of
real
live
users,
a
lot
of
whom
just
learned
how
to
program
in
a
boot
camp,
and
they
they
will
tell
you
when
things
break,
and
they
will
tell
you
when
things
are
too
slow,
and
so
that's
kept
us
really
honest
and
really
focused
on
user
needs
as
well.
So
the
way
that
we
tend
to
look
at
things
is:
what
is
the
user
need
that
we
can
kind
of
uniquely
solve
as
a
service
provider
with
protocol
expertise
and
man?
A
It
is
awesome
to
build
distributed,
Cloud
systems
when
your
users
hand
you
decentralized
protocols.
Nothing
is
in
the
way
of
you
scaling
that
out
and
building
it
properly.
You
know:
we've
had
a
lot
of
iterations
now
in
Cloud
systems
and
there's
a
lot
of
amazing
cloud
infrastructure
that
you
can
kind
of
pull
off
the
shelf,
but
you're
often
limited
by
what
your
users
want
to
do.
Requiring
some
level
of
centralization
right,
like
no
number
of
features
in
Dynamo,
will
help
you.
If
what
your
users
want
to
do
is
a
SQL
query.
A
A
So
talk
first
about
the
right
pipeline.
When
we're
looking
at
rights,
we
need
to
think
about
the
rights
being
in
three
states.
The
first
is
at
rest,
so
on
a
user's
device.
The
second
is,
we
have
the
data
we've
taken
it
into
our
system
in
some
way
and
ingested
it
from
the
user,
and
the
third
is
that
data
is
actually
available
in
the
ipfs
network
and
that
can
be
a
little
kind
of
tricky
to
talk
about
and
guarantee,
but
I'll
get
into
that
so
data
at
rest.
A
A
That
said,
it's
also
not
already
in
a
car
file,
but
we
would
very
much
love
it
to
be
in
a
car
file.
If
it
is
already
in
a
car
file
and
broken
into
a
dag,
then
we
don't
have
to
do
that.
Work
in
our
back
end,
but
even
more
importantly,
the
cryptographic
guarantees
that
they
want
to
to
do.
The
next
part
of
their
workflow
in
the
user's
application
is
already
available
to
them.
A
Before
we've
made
the
data
available
like
what
we're
looking
at
when
we
take
an
ipfs
data,
is
usually
we're
part
of
some
other
users
transaction
right,
like
they're,
putting
data
into
our
system.
So
they
can
get
a
CID,
so
they
can
put
it
into
a
blockchain
transaction
or
some
other
system
and
giving
them
that
early
means
that
they
can
start
to
put
that
all
together,
concurrent
to
us,
taking
in
the
data
which
really
helps
them
out
and
and
provides
like
a
much
better
user
story
where
they
get
immediate
feedback.
A
But
some
of
the
disadvantages
here
is
that
there's
just
not
a
Unix,
invest
encoder
for
every
language
and
every
system
like
if
you're
on
python
right
now,
like
you
kind
of
can't
so
for
this,
we've
been
building
sort
of
two-stage
infrastructure
and
looking
at
two-stage
infrastructure,
where
we
we
take
data
in
various
formats
and
then
turn
it
into
a
car
file.
A
And
then
the
car
file
becomes
what
we
take
in,
and
even
our
new
pinning
API
that's
getting
built
out
is
just
taking,
pin
requests,
turning
them
into
a
car
file
and
writing
them
into
the
system
and
we'll
take
that.
For
regular
user
data-
and
we
won't
even
take
like
tar
balls
for
you-
know,
directory
structures
and
stuff
like
that.
A
This
is
it
we're
currently
in
like
testing
internally
right
now
for
this,
so
this
isn't
available
to
users
yet.
But
this
is
a
peek
at
like
our
new
data
received
pipeline.
A
A
So
we
really
wanted
to
build
a
system
in
which,
like
users
could
delegate
that
to
other
users
into
devices
down
a
chain.
So
ucans
really
gave
us
that
so
the
new
system
takes
this
ucan
request
and
it
says
Hey
I
want
to
upload
this
car
CID
a
car
seat.
Id
is
a
is
a
CID.
That's
the
hash
of
the
entire
car
file.
That's
coming
in!
It's
not
the
the
root
node
in
the
car
file
that
that's
nothing!
Don't
worry
about
that!
A
It's
the
hash
of
the
entire
car
file-
and
we
actually
enforce
shout
out
to
256
for
this
for
a
reason
that
I'll
get
into
in
a
minute.
Then
what
we
return
to
the
user
is
actually
assigned
URL
to
S3.
So
this
allows
like
the
our
customers,
customers,
customers
device
to
directly
upload
into
S3
without
any
proxying
layer
in
between,
and
then
we
receive
that
into
our
system
and
we
can
do
stuff
with
it.
We
key
that
this
is
brilliant.
A
We
key
that
with
the
CID
slash
cid.car,
because
the
way
that
S3
implementations
work
is
that,
because
now
every
cloud
provider
has
an
F3
implementation,
because
it's
so
popular,
but
the
way
that
they
work
is
that
they
have
various
like
load,
balancing
algorithms
and
scaling
algorithms,
but
they
they
tend
to
find
that
by
a
prefix.
You
need
some
data
locality.
A
So
if
you
look
on
aws's
documentation,
for
instance,
and
like
what
is
the
read
and
write
throughput
limits
in
S3,
they
won't
tell
you
that
there's
any
limit
on
a
bucket,
the
limit
is
always
per
prefix.
So
if
you
prefix
by
something
that's
a
hash,
you
like
you
you've
now
like
distributed
all
of
the
names
across
the
entire
set
evenly,
so
all
of
their
scaling.
Algorithms
are
going
to
work
like
kind
of
perfectly
and
you
hit
like
no
limitation
right.
A
The
other
amazing
thing
is
that
in
these
signed,
S3
URLs
you
can
tell
it
to
validate
the
shot,
the
checksum
of
the
actual
input
data.
So
we
only
give
you
Earls
that
will
that
will
that
S3
will
validate
with
that
shot,
256
hash,
so
we
so
a
thousand
users
try
to
upload
the
same
thing.
At
the
same
time,
we
can
give
them
all
assigned
URL
into
the
same
bucket
and
none
of
them
can
override
each
other.
So
we
basically
have
a
lock
free
upload
infrastructure
into
S3
into
our
distributed
system.
A
A
So
yeah
we're
incredibly
excited
about
this,
and
this
is
really
set
up
for
us
to
not
just
support
like
nft
storage,
but
to
support
thousands
of
customers,
the
size
of
nft
storage.
Once
we
have
have
the
data
we
need
to
make
it
available,
and
this
is
why
we
build
elastic
ipfs.
So
the
way
that
elastic
ipfs
works
is
that
you
give
it
a
URL
to
a
car
file
as
input
you
don't
like
sort
of
write
data
into
the
system.
A
Our
original
design
actually
was
to
write
it
into
a
bucket
and
get
the
bucket
notification,
and
then
we
realized
like
this
is
so
much
nicer.
If
you
just
de-hubble
those
things,
because
now,
if
it's
just
a
URL,
you
can
put
data
in
different
buckets
for
whatever
reason
that
you
want.
You
can
take
data
from
remote
systems
and
other
customers
that
already
put
them
up
in
in
like
HTTP
URLs.
It
doesn't
really
matter
to
us-
and
this
also
allowed
us
to
onboard
a
lot
of
data
that
we
already
had
before.
A
We
had
the
the
piece
that
I
just
showed
you
ready
yet
so
we
actually
have
this
running
in
production.
Now,
it's
the
main
storage
provider
for
NFC
storage
and
web3
storage,
and
we
were
able
to
onboard
like
our
entire
backup
bucket
of
car
file
data
and
then
basically
use
the
bucket
that
we
were
sending
backups
into
as
our
like
main
storage
infrastructure
now,
without
swapping
out
any
infrastructure,
and
that
was
all
able
to
come
up
in
parallel
to
the
existing
systems.
A
We
then
so
once
we
have
a
URL
to
a
car
file
that
car
file
gets
indexed
by
Lambda,
and
so
every
block
in
that
we'll
get
the
we'll
pull
the
multi-hash
out.
The
multi-hash
will
get
written
into
these
into
Dynamo
and
we've
just
sort
of
revved.
The
Dynamo
schema
there.
A
So
that's
that
works
a
lot
faster
now
too,
and
then,
once
that,
once
those
records
are
in
Dynamo
now,
there's
actually
a
pool
of
node.js
processes
managed
by
kubernetes
so
depending
on
the
amount
of
load
in
the
system,
it'll
spin
up
more
less
down
and
those
are
actually
handling
the
bit
swap
requests
that
all
come
in
over
websockets
and
then,
as
far
as
like
how
we
distribute
and
manage
the
websocket
connections,
we're
just
leveraging
the
regular
AWS
websocket
infrastructure
there
like
they
do
all
of
that
load
balancing
for
us
and
in
fact,
we
now
get
to
operate
as
one
peer
ID
for
the
entire
system
and
we
can
even
run
in
multiple
regions
and
have
that
also
managed
by
AWS
and
that
connection.
A
So
this
is
really
nice
actually,
because,
when
you're
a
really
large
provider,
you
want
to
get
added
to
every
gateways
peer
list
so
that
you're
just
really
fast
without
having
to
do
DHT
lookups
and
if
you're
constantly
adding
new
nodes
into
your
cluster.
You
have
to
constantly
be
messaging
to
all
of
those
providers.
Like
hey,
add
these
new
peer
IDs,
so
having
one
peer.
Id
has
been
really
phenomenal
here.
A
All
right.
Let's
look
at
reads
so
most
reads:
come
from
HTTP
gateways,
that's
just
kind
of
the
reality
right
now.
So
let's
look
at
how
we
handle
some
of
that
we're
we're
kind
of
in
love
with
cloudflare,
for
our
read
architecture
and
for
really
a
lot
of
our
HTTP
architecture.
A
They
have
mostly
free
egress,
which
is
really
really
nice,
especially
like
being
a
multi-tenant
ipfs
provider
is
a
little
bit
difficult
in
that
you
don't
know
which
customer
to
charge
for
read
throughput
Because
multiple
customers
can
upload
the
same
data
and
you're
just
getting
the
content
addresses,
so
it
becomes
very,
very
difficult
to
actually
charge
for
read
bandwidth.
So
it's
nice
to
have
some
Revenue
alignment
here
on,
like
what
we're
being
charged
for
and
what
we
can
actually
charge
our
users
for
we
built
a
Gateway
CDN.
A
So
it's
not
I
think
that
it
made
the
message
as
this
a
few
times,
but
it's
not
technically
an
ipfs
Gateway,
even
though
it
has
the
whole
ipfs
Gateway
API
there.
It's
really
like
a
CDN
in
front
of
gateways,
and
so
what
that,
how
that
operates
is
there's
the
regular
HTTP
cache
in
cloudflare,
which
is
there
for
anything
you
do,
including
workers,
and
about
about
40
of
our
requests,
just
hit
that
regular
HTTP,
cache
and
they're
fine
and
obviously
like
ipfs
data
is
immutable.
A
So
all
of
those
cache
headers
are,
you
know,
never
take
the
set
of
cash.
If
you
don't
need
to,
then
there's
a
secondary
cache
that
we
have
in
worker
KV
and
we
also
have
a
product
called
super
hot,
where
you
can
take
any
Gateway,
URL
and
we'll
just
like.
Take
that
Gateway
URL
and
cache
it
in
cloudflare
Forever
the
whole
rendered
state.
So
we
can
make
that
super
fast
as
well,
and
those
caches
are
actually
above
60
of
the
60
that
come
through
the
other
cash.
A
So
these
rates
are
great,
and
if
it's
not
in
one
of
these
caches,
then
we
race
a
bunch
of
gateways.
So
who's
the
fastest
cloudflare
was
was
kind
enough
to
actually
run
a
cloudflare
Gateway
in
our
Zone
private
just
to
us
for
this
infrastructure.
So
we
hit
that
one
we
hit
ipfsio
and
we
hit
pinata
like
all
in
parallel
yep.
So
that's
great
we're
really
happy
with
that
with
the
performance
of
our
Gateway
right
now,
our
customers
are
really
happy
and
having
immutable.
A
A
You
can
return
it
with
cache
headers
that
say
never
let
this
fall
out
of
cash,
you're,
very
safe
and
now
we're
sort
of
starting
to
look
at
like
what
would
a
bit
swap
CDN
look
like
because
we're
very
happy
about
all
of
this
and
we're
really
kind
of
unhappy
with
aws's
bandwidth
charges,
so
we're
we're
looking
at
cloudflare
workers
and
Ellen
and
Vasco
actually
got
bit
swap
running
in
a
cloudflare
worker,
just
like
between
the
last
time
that
I
did
this
talk-in
now
they're
they're,
amazing,
and
if
we
were
running
the
math
on
this,
and
it's
actually
cheaper
for
us
to
copy
the
data
out
of
S3
into
R2
and
then
serve
it
once
then,
it
would
have
been
to
serve
it
twice
from
AWS.
A
So
it's
even
having
two
copies
of
the
data
and
two
systems
is
like
not
really
a
problem
for
us
and
we
have
a
lot
of
processing
workloads
that
we
actually
have
to
run
in
AWS,
because
classler
has
like
two
percent
of
the
features
of
AWS,
so
yeah
we're
looking
into
this
right
now
and
I
I.
Think
that
we're
probably
going
to
end
up
going
that
way
and
that's
my
talk,
thank
you.
A
And
I
should
say:
there's
a
bunch
of
follow-up
talks
about
elastic
ipfs
and
a
few
different
tracks,
including
the
connecting
FPS
track
that
I'll
be
running
tomorrow.