►
Description
In our next How We Built This, @brian-hoffman-ob1 will talk through how his team stored package manager registries on Filecoin, covering the nuances of storing versioned data (without automatic deduping) + small files on Filecoin and building a UI to explore this data meaningfully!
Keep up with events for the Filecoin community by heading over to the Filecoin project on GitHub:
https://github.com/filecoin-project
Check out the Filecoin community resources:
https://github.com/filecoin-project/community
And stay connected on Filecoin Slack:
https://app.slack.com/client/TEHTVS1L6
A
All
right,
hi
everybody,
hopefully
there's
some
people
on
the
on
the
live
stream
and
there's
gonna
be
more
to
to
view
this
after
after
I
get
it
recorded,
but
my
name
is
brian
hoffman
and
I'm
from
ob1.
A
As
you
can
see
on
this
slide,
and
today
I
just
kind
of
wanted
to
talk
a
little
bit
about
a
project
that
we've
been
working
on
for
last
few
weeks
called
5mb,
and
I
know
this
is
the
how
how
it
was
built
series
so
hopefully
you'll
get
some
of
that
and
you'll
find
it
interesting
to
learn
about
our
journey
of
using
filecoin
and
and
developing
on
top
of
it.
So
today
it.
A
I
don't
expect
this
to
be
a
super
long
talk,
but
hopefully
it's
still
comprehensive
and
interesting,
but
the
agenda
today
we'll
just
talk
a
little
bit
quickly
about
who
I
am
and
what
ob1
is.
So
you
have
some
context.
I'd
like
to
talk
a
little
bit
about
the
problem.
We
were
really
trying
to
solve
with
5mb
what
it
is
and
then
you
know,
obviously,
most
importantly,
how
did
we
build
it
and
then
I
just
wanted
to
you
know
I
think,
with
any
new
and
growing
project.
A
You
want
to
talk
a
little
bit
about
the
challenges
that
you
had
to
either
overcome
or
that
you're
still
experiencing,
so
that
we
have
an
idea
of
where
we
want
to
go
to
from
here
and
how
things
can
get
even
better
and
and
then
and
then
just
briefly
kind
of
where
we
want
to
go
to
next
or
where
it
could
go
next,
like
I
said,
my
name
is
brian
hoffman,
I'm
currently
the
ceo
of
ob1,
which
is
a
venture-backed
startup,
we've
been
around
since
2015..
A
My
background
is
technology,
so
I'm
a
tech,
ceo
and
I
did
prior
to
this
work,
I
spent
about
a
decade
working
in
government
and
corporate
consulting
for
cyber
security
stuff,
which
is
what
led
into
my
interest
in
bitcoin
and
decentralized
networks
and
crypto.
And
so
that's
that's
a
little
bit
about
me.
A
Obi
wan,
the
company
sprung
out
of
a
project
called
open
bazaar,
as
you
probably
see
above
me,
if
you
can
see
my
video,
I
don't
know
if
you
guys
can
see
my
screen,
but
it's
an
open
source,
decentralized
marketplace
and
we
built
it
on
top
of
ipfs,
so
pretty
intimately
familiar
with
protocol
labs
and
the
technology
that's
been
built
there
for
the
last
few
years.
A
A
One
of
the
cool
things
about
these
two
and
why
it's
worth
mentioning
here
is
that
we've
we've
integrated
filecoin
into
the
app
it
it
hasn't
launched
yet
because
we're
going
to
be
launching
it
with
mainnet,
so
we're
looking
forward
to
that
coming
soon
and
and
then
you'll
be
able
to,
you,
know,
send
and
receive
file
coin
and
and
buy
and
sell
things
with
it
on
our
on
our
apps,
which
is
really
cool.
A
But
what
we're
here
to
talk
about
today
is
a
problem
that
came
up
when
we
were
discussing
how
ob1
could
help
with
some
of
the
file
coin.
Things
going
on,
and
the
key
problem
here
is
that
we
have
a
bunch
of
software
package
repositories
out
there.
So
you
know
if
you
want
to
download
a
package
for
arch
linux
or
centos
or
one
of
these
operating
systems.
You
usually
go
to
these
mirrored
ftp
sites
or
these
websites
to
download
those
packages.
A
You
point
them
down
from
there
or
you're
using
something
like
npm
to
get
software,
and
you
know
one
of
the
issues
that
that
the
repositories
have
is
that
you
know
they
have
to
ask
all
these
people
to
mirror
this
content,
or
you
know,
there's
a
lot
of
content
that
they
put
out
there.
So
people
can
get
it,
but
it's
not
really
being
used
very
often.
A
So
it
could
be
an
obscure
package
that
maybe
only
a
few
people
need
every
once
in
a
while,
but
yet
that
data
has
to
stay
online
somewhere,
and
you
know
that
that
presents
an
a
challenge,
and
so
we
think
that
you
know
possibly
a
hybrid
online
data
storage
system,
something
that
we
could
use
filecoin
for
would
be
needed
to
ensure
that
that
data
is
readily
available
when
necessary,
but
at
the
same
time
the
less
used
data
gets
archived
and
is
there,
but
not
necessarily
always
immediately
accessible.
It
just
has
to
be.
A
Retrievable
has
to
be
cost
efficient,
so
that's
kind
of
the
general
problem
that
we're
looking
at,
and
I
know
that
a
lot
of
the
projects
in
this
in
the
file
coin
space
are
solving
similar
things
and
tackling
them
in
different
ways,
but
this
is
kind
of
what
we
came
up
with
well,
I
did
want
to
briefly
to
mention.
You
know
why.
Why
do
we
even
find
filecoin
interesting?
A
Like
I
mentioned
earlier,
we've
been
building
on
top
of
ipfs
for
for
years
now
and
open
bazar
is
built
on
top
of
it
and
the
entire
network.
The
whole
marketplace
is,
is
you
know,
accessed
by
peer-to-peer
communication
using
ipfs
and
all
the
data
is
stored
by
those
nodes
in
kind
of
an
altruistic
way,
not
necessarily
fully
altruistic,
but
you
know
in
a
way
that
they
all
are
contributing
to
the
network
and
they're
not
paying
for
it.
You're,
not
paying
other
people
to
store
it.
A
So
we
feel,
like
filecoin,
could
be
a
way
to
move
towards
a
more
sustainable
data
ecosystem
where
you're
not
just
asking
people
to
store
data.
Sometimes
people
can't
store
data.
You
know,
if
they're
on
our
haven
wallet
it's
a
mobile
device,
they're,
typically
not
running
the
app
all
the
time
you
know
in
ios
it
would
get
backgrounded
or
something
and
get
killed,
and
so
there
still
needs
to
be
more
reliable
and
sustainable
way
to
store
that
data
and
for
other
apps
as
well.
A
There
still
needs
to
be
more
and
better
tools
for
managing
the
network
and
using
the
network
and
and
providing
consumer
facing
applications
that
let
them
get
to
the
data
that
they
want,
such
as
the
source
code
repositories
and
the
package
repositories
and,
most
importantly,
in
our
in
my
opinion,
you
know
the
protocol,
labs
team
and
and
the
contributors
involved
with
building
things
like
ipfs
and
and
lib
p2p,
and
all
the
other
underlying
technologies
that
kind
of
have
gone
into
filecoin.
A
It
really
has
shown
that
it
can
work,
it's
strong
and
it's
growing
in
the
right
direction
compared
to
a
lot
of
the
competition
out
there,
and
so
you
know,
we
considered
a
very
trusted
brand
and
a
group
of
developers.
A
So
what
is
5mb
so
you
might
not
recognize
what's
in
the
photo.
Basically,
this
is
a
five
megabyte
hard
drive.
This
is
from
1956.
A
it's
one
of
the
very
first
hard
drives
considered
portable
as
like
or
moving
it
somewhat,
but
so
we
just
decided
in
honor
of
that,
since
we're
talking
about
storage,
to
call
the
project,
5mb
really
didn't
have
much
of
a
better
reason
to
call
it
that.
But
it's
an
experimental
project
that
we've
built
to
make
software
package
repositories
available
via
the
web
and
it
uses
ipfs
and
filecoin.
A
So
these
are
the
tech
components
that
primarily
go
into
building
this.
So
we
have
a
a
golang
based
repository
processor
service
that
we've
tentatively
called
amazon
because
it's
involving
packages
and
delivering
packages.
So
I
don't
know
that
we
might
get
sued
for
that.
So
that'll
probably
have
to
change
at
some
point,
but
that's
the
working
title.
For
the
moment
we
have
a
golang
based
http
server
that
serves
up
the
front
end.
A
We
use
powergate,
obviously
in
the
middle
of
the
diagram,
that
that
helps
us
facilitate
the
communication
between
ipfs
and
filecoin.
Dealing
with
that.
Behind
that
we
have
lotus,
we
also
use
mongodb
to
back
as
the
back
end
data
store
for
our
application.
A
We
use
ipfs,
obviously
here,
and
we
also
have
some
rsync
scripts
and
if
you're
not
familiar
with
rsync,
allows
you
to
synchronize
data
sets
across
servers.
So
in
this
case
from
these
icons,
you
have
centos
and
arch
linux
and
debian
things
like
that.
We're
pulling
these
terabytes
of
data
down
from
those
repositories
and
putting
them
onto
our
server
so
that
we
can
move
them
where
appropriate
within
the
ipfs
and
filecoin
ecosystem.
A
A
So
in
stage
one
we
call
it
ingestion,
it's
pretty
straightforward,
essentially,
ob1
works
to
use
rsync
and
scripting
to
ingest
the
package
repositories
into
an
amazon
ebs,
so
we're
running
it
all
and
using
containers
and
data
storage
on
amazon
at
the
moment,
and
that
data
is
then
partitioned
into
different
logical
data
sets
so
that
it
can
be
processed
further
on
because
we
have
a
bunch
of
different
data
sets
of
different
sizes
of
different
types
of
data.
A
And
so
that's
that's
part
of
the
first
step,
and
then
you
know
going
forward.
You
have
to
do
periodic
synchronization
of
these
data
sets,
because
each
one
of
these
repositories
updates
their
content
at
different
time
periods.
Some
of
them
are
updating
every
hour
somewhere
every
24
hours,
some
less
than
that,
and
so
there
are
different
rule,
sets
for
different
of
these
package
repositories.
A
So
at
this
point
we've
got
a
golang
processor,
that
is
the
amazon
service
that
I
told
you
about
amz
in
which
works
to
stage
the
data
into
ipfs,
it's
kind
of
the
first
step.
So
we
push
all
that
data
into
ipfs
and
and
then
what
we
are
able
to
do
is
to
look
at
the
data
structure
based
on
that,
and
we
have
to
do
a
couple
things
here,
because
the
data
sets
are
different
and
the
way
we
handle
their
their
push
into
file
coin
depends
on
the
structure
of
that
data.
A
So
if
we
were
pushing
you
know,
200
terabytes,
we
treat
that
differently
than
if
we're
pushing
you
know
one
megabyte
or
two
megabytes
of
data.
It's
a
completely
different
kind
of
beast,
so
we
have
some
logic
there
to
do
that,
and
then
we
basically
break
up
these
objects
into
buckets.
What
we
call
buckets-
and
I
know
this
is
kind
of
an
overloaded
term-
I'm
sure
the
textile
guys
are
probably
like.
A
We
have
buckets,
but
essentially
they're
they're
like
buckets,
but
we
take
the
ipfs
objects
and
we
rearrange
them
into
buckets
and
keep
track
of
those
things
within
our
data
store
or
mongodb,
so
that
we
know
where
each
piece
is
and
the
reason
we
do.
That
is
so
that
we
can
sort
them
into
kind
of
comparatively
sized
buckets
when
we
push
them
into
file
coin,
because
we
can't
push
each
file
separately
right.
A
We're
talking
about
like
one
repository,
for
instance,
I
think
centos
is
like
600
000
files,
individual
files,
for
instance,
of
different
varying
sizes.
So
you
can't
really
just
push
them
all
in
separately
and
it
doesn't
really
make
sense
to
push
them
all
as
one
big
file,
because
you
know
if
you're
retrieving
like
a
very
small
file
or
you
want
to
retrieve
a
specific
file,
you
don't
necessarily
want
to
pull
down,
say
two
terabytes
from
file
coin.
You
know
when
you
only
need
a
small
set,
so
there's
kind
of
some
algorithm
there.
A
What's
the
best
trade-off
that
you
can,
you
know
compromise
to
make
in
order
to
make
it
not
ridiculous,
but
at
the
same
time
still
feasible
and
so
that's
kind
of
the
processing
phase,
and
this
is
a
bulk
of
kind
of
the
work
that
we've
spent
the
time
that
we've
spent
on
building
this
is
is
really
in
this
stage,
because
it
is
it's
a
very
you
know
you
want
to
do
it
properly,
because
you
know
it
has
to
be
functional,
has
to
be
usable.
A
Then
we
move
on
to
stage
three,
which
is
called
archiving
and
at
this
point
this
is
when
the
data
buckets
are
now
getting
pushed
into
file
coin,
and
we
use
powergate
to
do
that.
This
is
a
very.
This
is
another
very
challenging
area.
A
I
think
that
a
lot
of
people
probably
have
these
challenges,
which
is
like
so
much
can
go
right
or
wrong
when
you're
doing
these
storage
deals
and
and
now
we're
talking
about
hundreds
and
hundreds
of
deals
and
of
different
sizes
with
different
miners,
it's
it's
definitely
you'll
come
across
a
lot
of
interesting
problems
there
to
work
through.
It
doesn't
always
work
as
planned,
but
this
is
the
part
where
we're
handling
those
kinds
of
things.
A
So
we
have
code
that
monitors
and
handles
the
deal
errors,
and
we
track
that
and
that
information
is
also
part
of
the
mongodb.
That
we've
built-
and
I
have
here
is
you
know
one
to
do.
We'd
like
to
do
is
to
be
able
to
better
expose
that
that
information
through
the
web.
I
know
that
other
people
work
on
similar
tools
and
we
didn't
really
go
down
that
that
road
too
far,
because
we
know
that
that
stuff
is
being
worked
on
and
we
want
to
make
sure
that
we're
not
duplicating
efforts
here.
A
A
The
existing
ipfs
ui,
which
we
thought
would
be
the
the
most
expeditious
way
of
getting
this
out
the
door,
so
people
could
use
it,
and
you
can
see
here
in
the
slide,
is
screenshot
of
that's
the
the
centos
repository
that
we've
pushed
into
ipfs
and
filecoin
and
essentially
the
way
that
it
works
is
if
the
data
is
is
hot
and
available
via
ipfs,
which
in
you
know
many
cases
it
is.
A
You
can
retrieve
it
very
quickly
through
the
web.
You
can
traverse
all
the
files,
all
the
folders
and
find
the
file
that
you
want
and
retrieve
it
immediately,
and
if
it's
not
we've
added
in
logic,
to
fall
back
to
file
coin
and
have
powergate,
you
know
retrieve
that
content
from
filecoin
if
necessary,
and
what
will
happen
is
that
the
ui
serves
up
a
notice
that
says
hey
this
content's
not
not
immediately
available.
A
It
will
let
you
know
when
it
comes
back
and
you
can
revisit
that
location
and
and
and
then
hopefully
that
that
content
comes
through
that.
That
challenge
in
itself
is
quite
a
user
experience
problem.
Potentially,
if
you
can't
get
to
access
to
the
data.
But
the
idea
here
behind
this
whole
thing
is
that
the
content
that
is
really
needed
to
be
accessible
all
the
time
will
be
there
and
be
quickly
retrieved.
But
you
know
in
the
case
that
some
random
or
more
obscure
file
is
being
requested.
A
Then
this
is
when
they
would
be
having
to
wait,
and
in
this
case
we've
kind
of
abstracted
away
the
payment
piece
of
it
as
well.
So
we
don't
require
people
to
pay
the
file
coin
to
go,
get
that
we
handle
that
behind
the
scenes
for
now,
and
so
really
it's
just
a
time
delay
on
their
on
their
on
their
side.
A
A
Things
are
breaking
and
fixing
and
breaking
and
fixing
and
people
are
adding
things
and
changing
things,
and
that
goes
across
the
whole
stack,
because
not
only
are
we
working
with
lotus
but
we're
working
with
powergate-
and
I
know
the
textile
team
is
working
really
hard
to
keep
up
with
that
and
and
improve
that
and
they've
been
really
really
great
to
work
with
everybody's
been
really
great.
So
that's
that's,
awesome
and
very
helpful.
A
Secondly,
you
know,
as
I've
mentioned
several
times
in
the
in
the
discussion
here,
user
experience
is
super
important.
I
mean
we've
always
been
building
software
as
a
company
over
the
last
five
years,
like
focused
on
the
user
experience
I
mean
in
crypto
and
decentralization
applications.
A
User
experience
is
kind
of
sometimes
the
second
second
priority
to
just
making
it
work
or,
like
you
know,
other
things,
but
we
really
wanted
to
find
a
way
to
make
it
so
that
it's
it's
usable
and
practical
and
not
just
an
exercise
just
to
say
we
did
it
and
then.
Finally,
this
is
such
a
steep
learning
curve.
I
think
for
the
technologies.
I
mean
it's
just
such
an
immense
amount
of
code
and
logic,
and
you
know,
verbiage
and
and
different,
like
it's
like
a
completely
different
vernacular.
A
I
think,
from
a
lot
of
what
people
work
on
you
have
to
kind
of
get
a
hold
of
in
order
to
really
understand
how
to
handle
things
when
they
come
up
and
we're
still
kind
of
grappling
with
that.
But
I
think
that
you
know
that's
one
of
the
exciting
things
about
it
is
is
obviously
learning
and
you're
doing
something
that
no
one
else
really
is
doing
so
that
always
comes
with
a
steeper
learning
curve
than
normal
so
where
to
where
to
next.
A
So
these
are
just
a
few
of
the
ideas
that
I
have,
I
think
for
at
least
this
project,
so
we're
just
focusing
on
that
which
is
you
know.
I
think
that
there's
a
lot
of
room
for
growth
on
the
ui
side,
to
really
kind
of
manage
the
entire
pipeline
from
all
the
way,
from
ingestion
to
presentation
to
the
user
and
a
lot
on
the
management
side.
A
Of
that-
and
you
know,
one
of
the
biggest
things
about
dealing
with
file
coin
is
just
you
know,
there's
just
an
immense
amount
of
different
actors
and
processes
going
on
at
the
same
time
and
there's
also
the
the
the
other
x.
You
know
the
extra
challenge
of
time,
which
is
that
you
know
in
building
like
online
applications.
It's
not
a
lot
of
like
things
that
present
like
a
very
challenging
time
constraint.
On
top
of
that,
it's
that's
pretty
unique.
A
We
also
are
going
to
be
and
we're
already
in
the
progress
of
doing
this,
which
is
expanding.
The
data
sets
that
are
available
online.
So
at
the
moment
I
think
we're
targeting
about
11
terabytes
of
data
and
we're
going
to
try
and
get
that
all
up
and
and
going.
A
I
think
that,
as
we
expand
we're
going
to
start
running
into
some
scalability
difficulties,
and
that
will
be
interesting
and
we're
going
to
try
and
figure
out
how
to
how
to
handle
that,
and
then
you
know.
Finally,
you
know
doing
this
kind
of
just
the
process
of
like
pushing
that
much
content
into
file
coin
for
repositories
that
change
pretty
quickly.
A
So
you
know
like
if
it's
changing
hourly
and
it's
a
two
terabyte
repository-
you
know
like
what
is
what's
kind
of
the
thinking.
You
know
all
right.
We
have
to
decide
like.
What's
the
thinking
around
whether
or
not
it
makes
sense
to
even
try
to
push
that
kind
of
data
into
file
coin?
Is
that
like
really
what
we're
trying
to
solve,
or
how
do
we
handle
that?
A
So
that's
another
kind
of
mental
challenge
that
we
have
to
get
over
as
we
go
forward,
and
so
so
that's
what
we're
kind
of
thinking
it
through
that.
That's
basically
my
last
slide,
but
I
did
want
to
just
briefly
talk
about
the
fact
that
we're
going
to
be.
I
think
the
repositories
for
this
code
is
still
private
right
now,
but
we're
going
to
be
opening
it
up
soon
and
putting
it
out
there.
A
So
people
can
take
a
look
at
it
and
and
help
out
or
expand
it
if
they
want
or
use
it
for
whatever
they
want
on
their
side,
which
is
going
to
be
useful
and
and
if
you're
trying
to
get
in
touch
with
us
or
you
want
to
find
out
more
about
what
we're
working
on.
You
can
follow
us
on
twitter
at
ob1
company
and
also
we
have
our
open
bazaar
handle.
A
A
A
A
Okay,
great
well,
it
sounds
like
we're.
We're
not
gonna.
Do
a
q
a
for
this,
but
yeah
thanks
for
tuning
in
anyway.
Thank
you.