►
Description
Ipfs-Embed is a small embeddable version of a subset of IPFS, written in rust. It is used in production in the German manufacturing industry.
This talk will present a novel API for pinning, caching and garbage collection that is designed for the use case of highly interactive and dynamic applications with soft real time requirements.
Afterwards there will be a discussion about advantages and disadvantages of this approach, and how it compares to the Kubo API.
A
80
percent
bed
is
an
implementation
of
ipms
in
Rust,
which
we
have
developed.
It
is
started
by
David
Craven
in
2020,
and
it
was
built
by
David
and
me
specifically
for
needs
of
the
company
tactics.
So
it
is
open
source
and
it
has
some
Kubo
interrupt,
but
it
was
not
the
primary
I
already
designed
goal.
So
if
you
connect
to
Kubo
it
might
work
it
might
not.
A
It
has
worked
at
some
time
for
some
cases,
but
not
really
so
it
wasn't
our
goal
basically,
and
it
is
intended
to
be
used
for
small
private
swarms.
So
it's
not
meant
to
be
used
in
the
global
ipfs,
but
in
a
small
swarm
of
let's
say
up
to
100
devices,
and
so
there
it
is
used
in
production
in
factories
in
Germany.
So
due
to
that,
there's
a
lot
of
bulletproofing
in
terms
of
peer
handling
and
all
kinds
of
things
that
you
only
can
figure
out.
A
If
you
are
in
production
and
things
start
breaking
at
2
pm
2
am
in
the
morning.
Okay,
so
act
takes
is
basically
you
have
a
private
swarm.
A
private
ipfs
swarm
which
connects
in
the
factory
connects
where
humans
with
a
tablet
in
the
hand,
machines
and
nodes
in
the
data
center
and
each
of
them
Builds
an
event
log
on
you
and
you
want
to
get
the
event
from
each
device
on
each
other
device.
A
Okay,
so
features
it
has
low
and
most
notably
bounded
resource
usage
in
particular
memory.
This
is
running
on
small
devices.
The
smallest
we
have
is
a
Raspberry
Pi
with
500,
mag
and
yeah.
You
cannot
exceed
the
memory
when
the
thing
stops
and
people
will
be
very
angry,
so
it's
just
not
good
and
the
API
is
Guided
by
the
principles
of
local
First
Development.
A
So,
basically
you
work
from
local
data
when
offline,
so
you
have
to
clearly
know
when
you
do
an
operation,
whether
it
will
be
local
or
whether
we'll
connect
to
the
network,
and
you
never
rely
on
the
network
being
available.
So
obviously
you
have
to
rely
on
the
network
sometimes
to
get
some
data,
but
at
least
you
should.
The
application
should
always
be
in
a
usable
State.
You
should
at
least
be
able
to
tell
the
user.
Your
data
is
not
there
sorry,
but
you
should
never
get
a
spinner
basically
and
it's
soft
real
time.
A
What
that
means
is
that
it
should
not
block
for
a
long
time
because
there's
some
stuff
depending
on
it,
but
we
are
not
using
it
to
actually
control
machines
which
could
kill
somebody.
So
it's
not
used
in
a
safety
critical
way,
but
it
is
used
in
a
way
where
you
really
really
don't
want
breaks
for
more
than
half
a
second
or
whatever
Okay.
So,
therefore,
the
API
might
not
be
appropriate
for
cloud
usage.
A
It's
just
very
special,
so
actix
is
published
for
basically
all
platforms,
Linux,
Mac,
Windows
and
Android
and
Linux
on
basically
every
arm
architecture.
There
is
because
there's
quite
a
bit
of
variety
in
in
the
in
these
machines:
okay,
API.
So
this
talk
is
not
about
the
internals.
That
will
be
too
long.
I
just
want
to
talk
about
the
API,
because
we
did
a
few
things
different
than
Kubo.
A
It
is
based
on
Rust,
ipfs,
Android,
P2P
and
sqlite,
and
other
than
that,
let's
just
see
what
the
API,
what
the
API
looks
like.
So
this
is
probably
the
most
controversial
thing.
I
started
with
that,
so
we
got
local.
I
o
you've
got
some
operations
which
are
always
strictly
local.
You
know,
when
you
call
these
operations,
you
will
never
touch
the
network,
and
so
they
are
strictly
separate
from
the
from
the
network
operations
and
they
are
blocking
boo.
So
people
really
look
down
to
having
having
blocking
courses
say.
A
This
is
like
20th
century
Tech,
but
I
find
it
very
practical
to
have
blocking
calls
occasionally.
So
async
is
not
without
performance
and
not
without
mental
overhead,
so
and
being
sync.
So
no
no
async
calls
simplifies
writing
complex
Logic.
On
top
of
of
these
things
and
yeah
I
got
another
talk
tomorrow
about
one
one
of
the
things
you
built
on
top
so,
and
that
means
existing
local
data
should
be
consistently
fast.
A
So
if
you
want
to
get
a
Blog
from
local
data,
you
will
get
the
blog
or
you
will
not
get
the
blog,
but
you'll
have
an
answer
in
less
than
a
millisecond,
typically
a
micro.
Second,
so
just
no
point
making
this
async,
it's
not
like
you're,
going
to
wait
for
a
second
or
whatever
and
yeah.
My
opinion
is
that's
the
only
thing
to
do
a
thing
to
do
in
an
embedded
use
case.
A
People
might
disagree
but
anyway,
so
it
might
be
different
if
you
have
a
cloud
deployment
like
we've
seen
with
other
use
cases
where
you
are
deployed
in
the
cloud
and
your
storage
is
not
local
to
your
peer-to-peer
node,
then
this
obviously
doesn't
apply,
but
this
is
for
this
use
case.
A
So
this
is
our
API,
the
local
API
you
can
get.
You
can
insert.
You
can
check
whether
the
block
is
there
and
you
can
list
all
blocks
that
you
have
that's
it,
and
this
is
the
mental
cost
of
async.
This
is
the
method
in
Rust,
with
lots
and
lots
of
Lifetime
parameters
and
I
mean
I've
been
doing
rust
professionally
since
2018,
but
even
I
have
a
hard
time.
A
Understanding
things
like
this,
so
it's
not
to
be
ignored
and
the
path
cost
is
that
whenever
you
have
async
and
rust-
and
you
have
abstraction-
you
often
have
to
box,
and
that
means
you
have
an
allocation.
Even
if
you
have
a
very,
very
cheap
operation,
you
need
to
hit
the
hit
the
allocator,
which
is
not
good,
okay
and
so
in
Rust.
They
have
this
catchphrase
abstraction
without
cost
and
I
say
in
asynchrist.
Abstraction
is
no
longer
without
cost
so
and
that's
bad
I
mean
so,
but
surely
not
everything
can
be
sync.
A
So
there's
a
embedded
database
called
sled
I
think
they
do
it
right.
They
have
a
cheap,
local,
local
interactions
which
are
synchronous,
and
then
they
have
a
call
called
flush,
which
is
basically
writes
the
state
of
the
database
to
disk,
and
that
is
async
and
ipv
is
a
bet
follows
this.
So
we
got
a
bunch
of
operations
which
are
sync
which
you've
just
seen,
and
we
got
a
sync:
it's
async
fetch,
Flash
and
async.
So
we
got
things
that
take
a
long
time
and
they
involve
a
lot
of
stuff,
I.
A
Think,
okay
and
there's.
There
are
some
very
funny
runs
by
Space
Jam,
the
developer
of
sled,
about
rust,
async
and
also
by
tomaka
the
main
developer
of
wrestling
B2B
about
rust
async.
So
if
you
have
some
some
fun,
you
can
read
them
latest.
Quite
hilarious
I
agree
with
almost
everything
there,
okay,
so
now,
let's
that
was
the
controversial
part
of
the
way.
A
Now,
let's
take
a
look
at
our
API
other
than
that
pinning
so
we
got
basically
pinning
is
in
our
case
is
completely
independent
of
whether
you
have
a
block
or
whether
you
don't
have
a
Blog,
so
pinning
just
expresses
something
like
if
I
had
a.
If
I
had
this
data
I
would
want
to
keep
it,
but
it
doesn't
mean
that
you
have
to
have
it
before
you
can
pin
it.
You
can
say:
I
want
to
pin
this
hash,
even
though
you
don't
have
the
data
for
the
hash.
A
I
mean
why
wouldn't
you
you're,
just
expressing
if
I
had
Wikipedia
I
would
be
cool
with
that,
and
that
means,
for
example,
you
can
pin
Wikipedia,
and
then
you
look
at
all
the
pages
and
then
everything
you
ever
looked
at
in
Wikipedia
will
be
on
your
disk,
because
you
express
the
the
desire
to
keep
it
by
pinning
the
root
of
Wikipedia
before
you
even
got
it,
and
then
all
pins
are
recursive.
A
A
Now
we
got
two
different
kinds
of
pins.
You
have
to
think
about.
If
you
build
a
deck
locally,
you
build
always
build
a
deck
from
the
bottom
up.
That's
the
only
way
it
works,
because
you
cannot
have
to
have
the
hash
of
the
route
before
you
build
stuff
and
if
you
think
something
from
somewhere
else,
you
always
build
a
deck
from
the
top
down,
because
you're
thinking
the
route
that
then
you
stick.
Your
ball
all
the
way
down
and
kind
of
for
these
two
use
case.
A
We
have
two
different
mechanisms,
so
10
pins
are
basically
you
use
them.
While
you
yourself
are
building
a
deck
and
what
you
do
is
they
are
incredibly
cheap.
You
can
consider
them
basically
free
and
they
use
a
rust
on
cpss
pattern
called
array,
meaning
that
you
create
a
temp
in
and
then,
as
soon
as
you
go
out
of
scope,
the
template
gets
deleted
and
the
the
sit
is
free
to
be
collected
again
and
the
intention
is
to
say
yep
GC.
Please
leave
me
alone,
while
I
build
this
thing.
A
So
in
our
case
the
GC
is
constantly
running
in
the
background.
So,
if
you
build
starts
something-
and
it
takes
like-
let's
say
five
seconds-
the
decent
chance
that
while
you
build
it,
it
would
be
collected
so
to
prevent
that
you
have
this
10
pin
mechanism
and
now
basically
they
are
ephemeral.
So
when
you
restart
your
note,
your
templates
are
gone
you
have
to,
but
if
you
are
interrupted
in
the
middle
of
building
a
deck,
you
probably
want
to
start
from
sketch
anyway.
A
Okay,
so-
and
this
is
quite
easy
to
implement
in
our
case,
because
the
ipfs
lives
in
the
same
process
as
the
application,
so
if
the
application
dies
also
ipfest
dies,
which
is,
if
you
have
I,
mean
two
different
processes.
You
have
this
whole
microservices
dilemma
and
it's
much
more
complex.
A
So
this
C
API,
you
can
create
a
campaign
and
then
you
can
add
a
sit
to
the
temp
in
and
you
can
add
as
many
seats
to
the
dampen
as
you
want,
and
it
is
very
cheap
and
then
the
other
thing
we
have.
We
have
name
pins,
they're
called
aliases
and
you
usually
have
a
few.
You
might
have
just
one
name:
pin
per
application,
and
so
what's
the
name
in
an
M
pin
anyway,
it
is
a
blob
I'm,
not
not
a
believer
in
restricting
things
like
that
or
to
utf-8.
A
And
this
what
the
API
looks
like
you
say,
Alias
and
then
you
have
a
a
blob
and
then
an
optional
set.
If
you
want
to
pin
something
you
set
it
to
some
you're
sit,
and
if
you
want
to
clear
it,
you
set
it
in
one,
and
this
is
an
example,
so
you
build
something
you
create
a
temp
in
then
you
build
your
stuff,
then
once
you're
done
building
you
set
an
alias,
and
then
you
delete
the
temp
in
which
you
don't
have
to
do
it.
A
It
happens
automatically
you
want
to
once
you
go
to
gets
out
of
scope,
so
this
is
a
clean
way
to
build
the
big
thing
locally.
A
Now
now,
until
now,
we've
only
talked
about
local
operations.
Everything
I've
talked
about
now
is
just
local.
You
know
local
first,
in
the
talk
also
now
we're
talking
about
the
network.
So
what
do
we
have
in
terms
of
network
operations?
We
got
fetch,
that's
exactly
like
getting
in
Kubo,
it
gets
from
local
or
from
the
network.
Whichever
comes
first.
Basically,
it
raises
fetching
from
the
store
and
getting
from
the
network,
and
then
we
have
something
else
which
is
called
Sing
and
sing.
A
So
once
you
have
finished
your
sync,
you
can
then
only
use
the
local
API
and
yeah.
Also
we've
got
gossip
sub
publish
to
subscribe
and
we've
got
broadcast
broadcast
is
something
to
just
send
something
to
all
your
current
peers,
which
is
in
a
local
scenario,
it's
quite
helpful
because
those
are
the
ones
which
you
can
easily
reach
without
multiple
hops.
A
Okay,
this
is
the
Box
API.
You
got
fetch,
which
is
async
and
sync,
which
returns
us
an
object
which
is
of
both
a
future
and
a
stream.
So
you
can
either
just
not
await
it,
which
means
you're
waiting
without
progress
or
you
can
tweet
it
as
a
stream,
and
then
you
can
get
some
nice
progress
update
and
you
can
show
it's
syncing
and
now.
This
is
an
example.
A
How
you
would
think
that
is
one
one
important
thing
here:
you
when
you
sync,
you
have
to
make
sure
that
your
data
is
safe
from
GC,
while
you're
syncing,
because
otherwise
you
might
sync
it
and
then
you
see,
might
kill
it
while
you
sync
it
so
what
you
do
is
you
create
a
temp
pin
and
then
you
sync
and
then
once
once
you're
done
with
syncing,
you
have
this,
you
switch
it
over
to
an
alias,
and
then
you
delete
the
temp
in
so
even
for
data,
which
is
currently
synced,
you
have
to
predict
it
from
GC.
A
So,
okay,
let's
then
yeah!
This
is
the
messaging
API.
It's
nothing
special
you
can
publish
on
Pub
sub,
you
can
listen
to
Pub
sub
and
you
can
broadcast
broadcast
is
just
sent
to
all
the
neighbors.
Basically
and
now
now
we
get
to
the
store
and
the
well
the
store.
The
most
interesting
thing
that
the
store
does
is
garbage
collection.
It's
based
on
sqlite
I
was
going
to
use
sled,
but
this
author
of
sled
advised
against
it.
He
said:
if
reliable
is
your
primary
constraint,
use
sqlite
state
is
better
so
well.
A
I
did
that
and
I'm
currently
working
on
the
future
store,
which
is
a
Radix
3
with
a
custom
storage
backend,
where
you
might
use
a
storage
package
from
Space
Jam,
the
author
of
sled,
or
a
very
simple
storage
backend,
which
would
work
and
wasn't
like
you're
using
index
b
or
whatever.
But
it's
never
going
to
be
the
primary
because
it
sucks
you
cannot
it's
really
hard
to
have
something
which
works
well,
both
in
wasm
and
on
Native
I
tried
hard,
but
I
think
this
is
a
good
compromise
anyway.
A
So
do
you
see
so
it
used
the
recursive
sqlite
query
to
find
out
the
live
set,
so
the
set
of
all
things
which
are
currently
pinned
and
it
uses
sqlite
Advanced
features
with
recursive
and
and
then
basically
drops
all
the
blocks
that
it
has
determined
that
are
often
basically
and
now.
The
notable
thing
is.
It
does
a
second
thing.
A
Incrementally
so
your
GC
poses
are
not
that
long,
so
this
is
the
sqlite
by
the
way
it's
the
sequel,
It's
quite
complex,
but
it
the
great
thing
about
this
is
that
it
is
not
that
fast,
but
it
is
bulletproof
because
it's
inside
a
single
sqlite
transaction-
and
it
will
just
work-
don't
worry
about
it.
It's
one
less
thing
to
worry
about.
A
Okay,
so
you
can
set
multiple
limits
on
the
store
you
can
set
GC
time
limits,
you
can
set
a
Target
GC
time.
That
is
the
maximum
time
or
the
target
time
that
you
see
is
about
allowed
to
run.
So
you
set
that
to
one
second,
it
means
the
GC
would
try
to
run
only
one
second,
at
a
time.
A
Of
course,
the
GC
might
take
longer
than
one
second,
but
then
you
see
we'll
have
to
split
it
up
into
multiple
runs,
and
then
you
can
get
a
problem
where
you
are
not
really
making
progress
and
to
make
sure
that
that
doesn't
happen.
A
You
can
set
a
minimum
amount
of
blocks
to
be
receipt
per
run,
and
if
you
don't
have
this
minimum
amount,
you
will
exceed
your
time.
That's
to
make
sure
that
you
always
are
collecting
enough.
Basically
and
okay,
that's
that
and
then
you
got
another
thing,
which
is
the
limits
of
the
store
size.
We
got
limits
both
on
the
size
and
on
the
number
of
blocks.
A
The
reason
for
that
is,
if
you
have
lots
of
small
blocks,
it
doesn't
matter
how
small
they
are,
they
will
cause
trouble
if
you
have
Kubo
configured
with
10,
gigs
and
I,
just
put
in
10
gigs
of
small
directories.
I
promise
you
it
will
fall
over
because
it
is
just
too
much
stuff,
so
you
have
have
to
have
limits
both
on
on
the
total
size
and
on
the
number
of
blocks.
It's
like
in
the
Unix
file
system,
where
you
have
an
inode
limit
or
something
like
that.
A
So
limits
do
not
apply
to
pin
data
they're
only
about
the
stuff
which
is
cached
and
the
config
is
here.
You
can
see
it
set
these
two
values
and
now
caching
caching
concerns
itself
with
the
the
part
that
is
not
pinned.
So
if
you
have
10
gigs
free
in
total-
and
you
have
one
geek
pinned,
then
caching
is
how
to
allocate
these
remaining.
Nine
gigs
and
there
you
have
a
customary
custom
caching
strategy,
so
you
can
have
one
which
is
built
in
which
is
in
memory
lru.
A
Basically
you
just
whatever
is
there,
which
has
been
accessed
last
gets
kept
in
case
of
a
GC,
so
example,
you
have
a
Gateway
which
has
a
lot
of
files
and
the
files
which
have
been
last
accessed
are
kept.
Then
the
next
one
is.
You
have
a
persistent
allow
and
in
that
case
it's
in
a
separate
database,
because
this
database
is
not
as
important
as
the
main
database.
A
If
it
breaks
no
big
deal,
you
just
lose
information
who
has
accessed
your
stuff,
it's
just
who
cares
and
then
you
can
even
have
custom
strategies
like
if
you
have
Unix
FS.
You
only
keep
directories,
preferably
because
they
are
more
important
to
find
the
structure
and
so
on
and
depending
on
your
application,
you
could
come
up
with
a
different
strategy.
What
to
keep
in
case
of
caching?
A
Is
there
some
more
here
yeah?
So
this
is
the
caching
trade.
Basically
it
it's
just
a
bunch
of
methods
which
get
called
on
every
axis
which
allow
you
to
keep
track
of.
What
has
happened
so
you
can
configure
out
what
to
keep
and
there
are
two
default
implementations
and
you
can
also
roll
your
own.
Basically
so
PS
we
do
a
bunch
of
things
about
PS.
So
if
you
run
on
restricted
Hardware,
you
really
cannot
afford
to
have
600
peers
just
a
no-go.
A
So
you
have
to
be
very
careful
about
which
PS
you
keep.
So
if
you
exceed
the
limit
of
PS
that
you
want
to
have
you
need
to
figure
out
which
piece
to
throw
out
and
which
Piers
to
keep.
So
what
you
do
is
you
have
a
kind
of
hierarchy
of
peer
value
so,
for
example,
your
bootstrap
here
you
want
to
keep
them
forever.
In
our
case,
because
this
is
a
this
is
probably
box
somewhere
in
in
a
cabinet
which
you
know
it's
always
going
to
be
there.
A
If
it's
not
there
for
for
an
hour,
it
doesn't
matter
once
you
get
back
in
in
into
a
wireless
line,
it
will
be
there.
So
you
always
keep
the
boots
up
here
and
then
we
got
stuff
like
if
you
have
manually
connected
peers,
you
want
to
keep
them
longer
than
others,
because
somebody
made
an
intentional.
He
said:
I
want
to
connect
to
this
peer.
A
So
please
keep
this
pier
because
I
told
you
you
know,
and
then
the
next
thing
is
mdnsps
remember
local
first
mdnsp
is
come
from
your
network
environment,
and
so
they
are
probably
more
valuable
than
something
far
away
towards
you,
because
there's
more
likely
that
you
will
get
them
again
at
some
point
and
then
I
mean
the
exact
details,
don't
matter.
The
main
thing
is
that
you
have
to
have
some
kind
of
logicalistic
about
which
Piers
to
keep.
You
cannot
keep
them
all.
A
So
you
have
to
do
something,
so
we
got
some
API
questions,
there's
a
bunch
of
things
internal
to
rust.
So
this
is
this.
We
have.
We
have
implemented
this
and
as
soon
as
we
had
something
that
worked,
we
had
to
move
on
to
the
next
level,
so
we
couldn't
spend
a
lot
of
time
designing
the
perfect
API.
A
So
we
have
a
bunch
of
questions
left,
but
I
think
we
will
be
able
to
come
up
with
something
something
better
at
number
zero.
Now
that
actually
we
have
time
to
focus
on
this
and
one
of
the
biggest
things
I
biggest
questions
I
have
is
how
do
you
do
an
incomplete
string
of
a
graph?
How
do
you
say
I
want
to
sync,
but
I:
don't
want
to
sync
the
entire
DAC,
because
it's
giant
I
want
to
sink
only
a
certain
subset
and
there's
a
bunch
of
heuristics.
A
You
could
do
to
limit
the
death
or
use
graph
sync
or
send
a
predicate
over
the
wire
or
anything
basically,
but
we
don't
know
how
to
do
that.
Yet,
okay
and
that's
basically,
it
I
think.