►
Description
How to be an index provider - presentecd by @masih at IPFS bing 2022 - Content Routing 1: Performance - https://2022.ipfs-thing.io
A
Right,
hello,
everyone,
I'm,
Massey
I'm
from
the
Bedrock
team,
I'm
from
the
Bedrock
team
responsible
for
building
and
expanding
a
network,
indexer
I'm
here
to
talk
to
you
about
how
to
become
a
network,
an
index
provider.
Sorry
about
the
type
on
this
slide.
What
is
an
index
provider
you
know
is
this
concept
has
been
mentioned
a
few
times.
So
what
what
is
it
really
so
index
provider
is
nothing
but
a
content
provider
that
does
two
things.
A
So
one
is
the
content
provider
that
okay,
so
a
content
provider
that
has
a
index
of
all
the
multi
hashes
that
it
contains
and
then
shares
that
list
of
multi-ages
to
the
network.
So
it
tells
everybody
what
multitude
has
and
the
second
thing
that
it
does
is
it
tells
an
expert
how
to
retrieve
it
right.
So
this
is
the
key
difference
between
a
content
provider
and
an
index
provider
an
index
an
index
provider
is
a
content
provider
that
tells
everybody
about
this
multi-ashes
and
teaches
them
how
to
retrieve
them.
A
So
what
is
the
general
overall
process
of
providing
indices
right
so
at
first
we
have
some
content.
That
content
goes
into
a
process
of
generating
advertisements
which
I'll
go
to
in
a
minute.
The
advertisements
that
are
generated
are
stored
in
a
local
file
system.
Advertising
themselves
are
nothing
but
content,
they're
all
addressable,
and
then
that
content
provider
would
make
an
announcement
to
the
network.
Saying
hey
I've
got
this
stuff
right
so
from
now
on
I'm
going
to
dive
deeper
into
each
of
these
components
and
talk
about
how
they
work.
A
What
they're
made
of
the
first
thing
I'm
going
to
talk
about
is
what
is
an
advertisement.
Advertisement
is
basically
a
piece
of
information
that
contains
a
link
to
the
previous
advertisement.
I'll
come
to
back,
come
back
to
that
in
a
minute,
the
idea
of
the
provider
in
terms
of
peer
ID.
It
contains
addresses
I.E,
where
you
can
contact
this
provider
right
there.
You
can
see
how
you
can
construct
something
like
Azure
info
from
provide
addresses.
It
has
a
signature,
so
that
verifies
that
it's
actually
the
provider
that
provided
the
record.
A
A
So
we
try
to
use
ipld
wherever
we
can
and
everything
we
talk
about,
has
ipld
schemas
you
find
Links
at
the
bottom
of
the
screen
that
points
you
to
the
ipld
schema,
so
you
can
have
a
look
at
those
and
you
know
understand
how
they
work.
So
how
does
these
advertisements
kind
of
connect
to
each
other?
So
here
we
have
a
picture
of
an
advertisement
chain
which
I
think
Andrew
had
earlier
in
his
slides.
As
you
can
see,
you
have
an
advertisement
which
has
a
list
of
fields
that
I
mentioned
earlier.
A
It
has
a
link
to
the
previous
advertisement
that
link
may
or
may
not
be
present.
The
absence
of
that.
That
link
means
that
we
have
reached
the
end
of
that
tree
of
chain
of
advertisements,
and
it
also
has
links
to
entries.
So
entries
is
the
actual
thing
that
contains
the
list
of
multi-hashes
we're
trying
to
advertise
as
a
content
provider,
so
entries
themselves
can
contain
a
chaining
I'll,
go
to
deeper
into
entries
themselves
in
a
minute
and
explain
the
type
of
different
type
of
entries.
A
So
when
it
comes
to
entries,
we
have
two
kinds
of
entries.
One
is
an
entry
chunk.
We
call
the
entry
chunk,
which
is
basically
an
array
of
bytes
and
remember.
Multi-Ashes,
are
nothing
but
bytes,
it's
just
a
multi-coded
code
and
a
digest,
and
it
has
a
link
to
the
next
right.
A
So
in
the
entry
chunk
type,
you
can
see
that
how
that
multi-actions
could
be
contained
in
a
single
message
right
and
the
way
the
point
of
having
the
next
link
is
that
at
some
point
you
will
hit
the
limits
of
the
message
size,
for
example.
So
providing
a
next
link
is
a
way
for
us
to
chunk
these.
So
it's
a
way
for
us
to
basically
influence
some
pagination
mechanism
on
top
of
ipld
data.
The
other
type
of
entries
that
we
could
have
is
hamped
and
hamt
is
a
Advanced
Data
layer
in
ipld.
A
It's
actually
a
prefixed
stream
map
is
a
way
for
you
to
define
a
map.
So
this
is
a
very
recent
addition.
You
can
find
the
specification
of
Hampton
the
link
below
the
point
of
using
hand
so
now
I'm
going
to
talk
about
what
is
the
difference?
What
why
why
would
you
choose
one
entry
chunk
entry
chunk
versus
hunt
right?
A
So
when
we
started
building
index
provider
we
started
with
entry
chunk
it's
nice
and
simple.
You
have
just
a
list
of
multi
hashers,
this
whole
chained
together
great
right.
This
is
really
easy
to
understand,
but
it
comes
with
a
problem.
So,
for
example,
when
you
want
to
divide
up
multi-ashes
across
multiple
shards
or
multiple
nodes,
this
connects
to
the
stuff
that
will
mention
earlier
in
terms
of
next
steps
in
terms
of
decentralizing
network
indexer
and
having
a
distributed
Network
indexer,
you
really
can't
use
entry
chunks.
A
A
So
we
want
to
provide
a
simpler
way
for
us
to
slice
and
dice
multi-ashes
and
a
way
for
us
to
quickly
find
out
what
are
the
multi-ashes
in
this
specific
link
that
I'm
as
an
indexer
responsible
for
that's
where
hamt
comes
in,
so
the
the
way
that
hamt
is
using
index
provider
is
actually
used
as
a
set.
Hamt
itself
is
a
map,
so
we
use
it
as
a
set
where
the
keys,
the
multi-ashes
and
the
value
is
always
set
to
something
simple
like
true
and
the
the
entire
multi-axis
are
basically
sorted
by
prefixed.
A
You
have
a
prefix
tree.
You
have
a
really
nice
and
efficient
way
of
finding
out
which
multi-ashes
exist
in
a
in
a
link,
but
it
does
have
these
disadvantages.
It's
a
bit
more
complicated
to
work
with
it's
all
built
on
very,
very
new
and
fresh
implementations
of
ipld
hands,
which
are
still
under
development,
but
it
has
a
huge
potential
and
opens
up
a
huge
array
of
design
decisions
for
us
to
make
in
the
future.
If
we
have
a
humped
based
entry
structure,.
A
I
touch
some
metadata,
so
what
is
metadata
in
an
advertisement?
If
you
remember
I,
started
by
saying
a
index
provider
is
a
content
provider
that
tells
everybody
about
the
multi-ashes
that
it
stores
and
also
tells
them
how
to
how
to
how
the
content
could
be
retrieved
metadata
is
the
thing
that
captures
how
the
content
could
be
retrieved.
The
metadata
itself
is
again
designed
to
be
extremely
extensible.
A
The
only
structure
that
it
has
is
it
starts
with
a
protocol
ID,
and
it
has
an
optional
byte
after
particle
ID,
which
defines
you
know
whatever
protocol
you
would
like
to
Define.
So
today
you
could
invent
your
own
special
way
of
fetching
information
and
have
your
own
metadata,
and
you
know:
there's
there's
nothing
stopping
you
from
that.
You
can.
You
can
Define
it.
There
are
two
specific
Methodist
types
that
are
defined
today.
One
is
transport
over
a
bit
swap
so
you
can
see
links
to
the
multicodec
CSV
table
there.
A
You
can
see
these
codes
inside
the
CSV
table
and
then
the
second
one
is
graph
sync
for
filecoin
V1
data,
the
first,
the
first
type
of
protocol
ID
bitsop,
really
doesn't
have
any
bytes
at
the
end
of
it,
because,
as
long
as
as
soon
as
you
know,
an
endpoint
support,
speed
swap
the
rest
is
simple.
You
just
ask
for
sits
and
get
you
get
blocks
back
is
extremely
simple.
A
A
So
here
I've
I'm
showing
you
the
ipld
schema
of
what
is
the
structure
of
bytes
inside
the
metadata
for
advertisements
that
support
graph
sync
filecoin
V1
retrieval,
as
you
can
see,
it
has
things
like
pcid,
whether
it's
a
verified
deal
or
not,
but
there
is
fast
retrieval
or
not,
and
this
metadata
then
makes
sense
to
something
like
Lotus
and
boost,
and
then
that
would
enable
you
to
retrieve
information.
A
There
are
Links
at
the
bottom
that
point
to
ipld
schema
and
the
go
documentation
package.
You
can
have
a
look
at
that,
so
we
talked
about
advertising
information,
advertising
content.
So
how
do
we
tell
people
that
hey
I
no
longer
have
this
or
what
would
happen
if
my
address
changes?
You
know
because
I
need
I've,
told
people
that
hey
I
have
this
I
have
this
multi-ash
and
you
can
get
it
from
this
address.
A
What
should
I
do
if
my
address
changes,
so
the
structure
of
advertisement
supports
two
specific
fields
that
allow
you
to
modify
advertise
contents.
One
is
context
ID,
the
other
one
is
a
Boolean
field
called
is
removed.
So
context,
ID
is
a
is
a
basically
a
unique
identifier
that
identifies
a
provider
and
the
metadata
you
can
think
of
it
as
a
way
of
grouping
multi-ashes
together.
So
I
think
Andrew
earlier
touched
on
this.
A
So
imagine
if
you
want
to
remove
a
whole
bunch
of
multi-actions
that
you've
advertised
you
really
don't
want
to
advertise
multitaskers
again
just
to
say
this
is
the
removal
you
know
remove
these
multi-ashes.
Instead,
you
can
tag
them
if
you
like,
with
a
context
ID
and
you
just
say:
hey,
remove
context,
idx
and
all
the
multi-ashes
associated
with
that
context.
Id
are
going
to
be
removed,
is
removed.
Like
I
mentioned
it's
a
field
inside
advertisement
and
whether
it's
set
to
true
or
not
defines
whether
the
content
is
being
added
or
is
being
removed.
A
In
a
case
where
you
would
like
to
change
the
metadata
Associated
address
and
metadata
data,
Associated
content
without
removing
it
again,
we
don't
want
to
advertise
all
the
multi-ashes
again
just
to
do
that.
So
there
is
a
specific,
no
content
sit
that
you
could
use
as
a
link
to
entries
and
then
simply
include
the
context
ID
with
the
new
address
and
new
metadata
and
as
soon
as
the
advertisement
published,
then
the
indexer
node
would
change
the
information
Associated
to
that
context,
ID.
A
So
so
far
we
talked
about
generating
advertisements
right.
What
is
that?
What
is
the
actual
fundamental
data
structure
that
we
need
to
produce
as
an
index
provider?
And
now
we're
going
to
talk
about?
How
do
you
tell
the
network
about
it
so
that
an
index
or
node
can
then
come
and
ingest
the
information
and
make
it
available
to
the
world
and
make
it
actually
findable?
So
the
the
points
where
we
tell
the
network,
we
call
it
announcement.
An
announcement
basically
includes
two
things.
One
is
what
is
the
head
of
advertisement?
A
If
you,
if
you
remember
I,
showed
you
advertisement
chain
that
is
interconnected,
so
the
head
of
advertisement.
It
means
that
the
latest
set
of
advertisement,
entry
that
exists
in
the
chain
and
the
second
one
is
the
publisher
address,
as
in
where
can
I
retrieve
this
multi-hashes
from
right,
so
that
the
announcement
themselves
could
be
made
in
two
different
channels.
So
you
can
ISO
either
use
gossip
sub
to
publish
this
information,
and
you
can
also
use
explicit,
put
requests
to
the
index
provider
via
announce
URL.
A
You
know
an
examples
of
providers
that
use
different
Avenues
here
are,
for
example,
Lotus
and
boost
are
using
the
gossip
sub
Network
to
disseminate
information
about
these
announcements,
whereas
something
like
NFC
storage
is
making
explicit
puts
to
the
current
indexer
to
notify
that
hey
I've
got
a
new
advertisement,
come
get
it
right.
A
There's
one
subtle
Point
here,
I
wanted
to
point
out
so
there's
this
concept
of
publisher
that
exists
and
publisher
could
be
different
from
the
con
from
the
content
provider
or
from
the
index
provider
if
you
like,
but
that
basically
means
you
can
delegate
publication
of
advertisements
to
a
separate
process
that
could
be
a
powerful,
yet
potentially
complicated
concept
that
allows
you
to
basically
scale
your
system.
A
If
the
thing
that
is
providing
content
doesn't
have
the
capacity
or
wants
to
isolate,
to
separate
the
task
of
publishing
advertisement,
so
you
could
do
that,
but
it
it
comes
with
a
specific
Clause
that
hopefully
we
will
make
you
know
less
important
later
on,
but
for
now
the
IDS
need
to
be
explicitly
whitelisted.
A
If
the
publisher
and
index
provider
use
different
identities,
because
advertisements
have
signature,
then
it
means
totally
new
sets.
So
you
you're
creating
a
different
chain
of
advertisements
like
which
is
something
to
consider.
So
you
know
in
a
typical
case,
you
can
imagine,
identities
could
be
shared
and
that
way
the
chain
of
advertisements
produced
by
the
publisher
and
by
the
actual
provider
would
be
identical.
A
So
what
are
the
interfaces
for
an
index
provider
I.E?
How
does
a
network
indexer
connect
to
the
index
provider
to
the
ingest
information?
So
there's
actually
two
different
interfaces
that
could
exist.
One
is
HTTP,
it
is
extremely
simple.
The
other
one
is
graph
sync
on
the
HTTP
endpoint.
You
simply
have
a
endpoint
that
exposes
the
head
advertisement
and
you
also
have
an
endpoint
that,
given
a
sid
Returns,
the
blocks
return
Returns
the
block
Associated
to
that
set
in
form
of
Json.
A
On
the
graph
sync
side
you
have,
you
can
have
combination
of
Gossip,
sub
or
HTTP
for
publication
of
announcements
that
I
talked
about
and
a
good
old-fashioned
graph,
sync
and
server
that
allows
you
to
just
simply
fetch
the
sits,
fetch
blocks
Associated.
So
the
links
that
you
see
on
the
right
hand,
side.
These
are
the
libraries
that
Implement
their
graph
sync
with
basically
graphsing
interface,
that
you
can
use
again.
Andrew
kindly
pointed
on
on
these
in
his
presentation
earlier.
A
So
on
the
implementations.
There
is
one
currently
one
implementation
of
index
provider
it's
written
in
go.
It
is
written
to
be
two
things
at
once.
There
are
historical
reasons
for
that:
one
is
being
Standalone
index
provider.
You
can
use
it
to
basically
expose
an
endpoint
and
give
it
car
files,
which
would
then
publish
the
network
and
say
hey
I.
Have
the
content
in
this
car
file
initially
built
that
to
test
the
network
index
there,
but
now
it's
available
for
anybody
would
like
to
use
it.
A
The
other
side
is
sdks
that
allows
you
to
embed
a
network.
Sorry
index
providers
inside
your
go
application
and
and
basically
build
your
own
thing.
This
library
is
used
by
filecon
in
boost
and
Lotus
is
used
by
S3,
there's
a
whole
bunch
of
clients
that
are
using
this.
The
URL
at
the
bottom
is
where
you
can
find
the
information
I
wanted
to
quickly
mention
a
rust
implementation
of
HTTP
into
interface,
which
is
written
by
Marco
sitting
over
there,
which
is
excellent,
so
excellent.
A
As
long
as
it
follows
a
network
index
or
protocol,
so
the
things
you
can
do
with
the
provider
CLI
is
you
can
list
advertisements
from
a
provider
given
its
multi-adder,
you
can
verify
ingestion,
you
can
use
it
to
run
an
index
provider
in
a
standalone
way
called
Daemon.
That
then
takes
car
files
and
basically
publishes
into
the
network.
A
I
wanted
to
dive
a
little
bit
deeper
into
this
tool
and
talk
about
having
written
an
index
provider,
no
matter
in
what
language,
whether
with
index
provider,
library
or
not.
How
would
you
verify
that
it
actually
worked,
so
you
can
use
the
index
provider
CLI
to
verify
that
it
is
working
so
with
index
provided
CLI,
you
can
check
you
can
list
the
advertisements
that
exist
in
the
existing
a
provider.
A
So
here
you
see
an
example
of
a
commands,
LS
list
advertisements
from
a
provider
multi-adder
and
what
you
see
is
an
output
that
shows
the
seat
of
the
advertisement,
the
set
of
the
previous
advertisement.
The
idea
of
the
provider
addresses
that
are
included
in
the
advertisement
and
whether
isn't
removed
or
not
I
need
the
background,
a
bit
of
a
process
that
is
gone
and
actually
fetched
all
the
entries
in
advertisement.
A
So
how
would
you
verify
that
a
published
advertisement
is
actually
ingested
by
an
indexer
again
index
provider?
Cli
provides
you
with
a
tool
that
allows
you
to
verify
ingestion,
so
the
command
you
see
here
is
very
verifying
that
advertise
the
multi-hashes
from
a
provider
so
I'll
get
into
that
in
a
minute.
But
multi-ash
is
from
a
provider
to
sit
that
contacts
which
is
so
happens
to
be
the
endpoint
of
a
network.
Indexer
today
only
recurs
once
so.
A
You
can
recursively
walk
the
chain
of
advertisements
to
find
out
which
multitashes
exist,
the
minus
a
speed
that
you
see
there
is
randomly
sampling,
10
percent
of
the
multi-hashes.
You
might
not
want
to
verify
the
ingestion
of
every
single
multi-hash
there,
so
it
allows
you
to
support
random
sampling
and
the
PID
at
the
end
tells
the
ncli.
What
should
be
the
expected
client
peer
ID
for
any
index
records
that
I
would
find
in
the
index
and
then,
as
you
can
see,
the
output
shows
you.
You
know
how
many
multi-ashes
are
verified.
A
You
know
how
many
were
on
indexed.
The
output
is
actually
quite
long.
It
gives
you
things
like
numerical
distribution
over
number
of
multi-ashes
and
things
like
that.
If
you
do
recursively,
but
I
haven't
added
it
here,
please
go
and
have
a
play
with
it
in
terms
of
minus
minus
FB,
you
can
actually
get
multi-assets
from
different
sources.
So,
if
you
think
about
it,
the
advertisement
chain
is
just
a
source
of
a
list
of
multi-ashes
right,
and
at
least
some
multi-agers
could
come
from.
A
A
car
file
could
come
from
just
a
detached
car
index.
So
all
three
of
those
are
supported
by
the
provider
CLI.
You
can
just
point
it
at
the
sort
of
source
of
montages.
It
goes
and
extracts
the
multi-ashes
and
verifies
ingest
some
a
little
bit
of
stats
on
sit
that
contact,
because
this
is
a
question
that
came
up
earlier
when
Andrew
was
giving
a
talk.
So
right
now
we
have
172
providers
on
sit.contact
about
26
of
them
are
file
coin
providers
and
in
the
last
seven
days
we
have
ingested
about
5
billion
multi
hashes.
A
You
can
see
the
list
of
all
the
providers
that
exist
in
the
see
that
contact
using
that
URL
and
what
you
would
see
is
things
like
city
of
the
latest
advertisement
that
is
processed
when
was
it
processed?
What
is
the
peer,
ID
and
so
on?
And
you
know
it
just
shows
you
the
entire
list
of
all
the
providers
that
exist
today.
A
So
what
are
the
next
steps
for
index
providers
so,
like
I
mentioned,
the
hammed
work
is
just
as
at
this
infancy.
So
it's
it
is
implemented
both
on
the
store
the
index
side.
So
it's
still
the
index
understand
links
that
point
to
hand
on
the
index
provider
side.
It
is
also
implemented
in
that
you
can
produce
advertisements
with
kind
harmed
as
entry.
A
This
also
connects
into
the
work
stream
in
terms
of
decentralizing
the
network
indexer
as
part
of
verifying
the
hand
we're
developing
a
indexer
mirror,
if
you
like,
so
you
can
imagine
because
the
list,
the
chain
of
advertising
strong
provider
is
completely
open
and
public,
so
I
can
technically
download
that
chain,
reshape
it
and
change
it
and
then
republish
it
myself
or
identically
republish
it
a
bit
like
a
CDN.
So
that's
we're
actually
working
on
a
tool
on
index
provider
that
allows
you
to
mirror
providers
and
potentially
remap
the
advertisement
chain
into
hand.
A
For
example,
there's
a
whole
bunch
of
open
questions,
I've
trimmed
them
down,
will
touched
on
a
few
Andrew
touched
on
a
few,
so
there's
a
whole
Rich
wealth
of
open
and
difficult
questions
in
index
provider
on
the
in
index
provider.
Specifically,
though,
on
the
network
indexer
on
the
index
provider.
Specifically,
you
have
things
like
how
long
should
a
pop
a
published
advertisement
remain
or
be
servable
by
the
index
provider,
and
how
long
should
it
be
discoverable
having
been
ingested
by
an
index?
A
A
The
other
stuff
that
we
want
to
talk
about
is
what
should
be
the
limits
on
advertisements
published
by
an
index
index
provider.
Where
should
we
say
that
hey,
you
are
big
enough
to
be
two
index
providers?
Please,
because
you
know
it
would
be
easier
to
fit
you
in
a
distributed
chartered
Network,
so
you
could
have
different
ways
of
doing
that
say
by
the
depth
of
advertisement
chain,
or
you
know
the
amount
of
multi-ashes
that
are
published.
These
are
all
techniques
that
we're
thinking
about
all
relates
to
the
more
extreme
about.
A
A
Last
but
not
least,
all
you
see
here
is
a
work
of
a
team,
more
specifically
will
and
Andrew
Bedrock
team
that
is
willing
to
integrate
all
those
into
our
application,
as
well
as
the
rest
of
the
pl
Network.
So
thank
you
all
I'll
have
any
questions.
That's
fine,
I'll,
take
any
questions
that
you
might
have
foreign.
B
A
Is
everybody
stores
their
multi-ashes
and
lists
of
files
differently?
So
how
can
we
build
it
so
that
it
is
agnostic
of
say,
lotus
or
boost
it
is
agnostic
of
ipfs
itself
is
just
works.
You
know
so
I
would
recommend.
Looking
at
that
example,
if
it's
not
a
go
client
I'll
point
you
to
documentations
on
a
soda
index
which
talks
to
talks
about
what
are
the
protocols
in
terms
of
providing
indices.
So
there
there's
two
one
is:
what
is
the
ipld
schema?
You
can
find
the
link
in
my
slides
and
the
other.
B
Marco
alrighty
providers,
publishing
camps,
yeah.
A
B
C
I
think
the
real
question
is
they're
sure
two
options:
there's
like
native
integration
of
this
sort
of
protocol
or
waiting
for
IPs
reframing
and
then
we'll
have
likely
a
sidecar
type
thing
where
I
could
invest
in
public
to
something
that
goes
along
with
this
through
that
reframe
portal.
And
then
it
ends
up.
B
It
might
be
really
useful
to
just
get
get
into
the
habit
of
getting
other
people
to
write
other
tools
that
ingest
other
content
and
publish
it
through,
because
otherwise
people
will
learn
that
they
can
do
that
and
they'll
think
that
everything
is
really
cool
and
so
like,
even
probably
even
after
returning
lessons.
I
work
so
for
groups
running
large
scale
systems
separated.