►
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
Hello,
I'm
andrew
gillis,
more
commonly
known
as
gamma
zero
in
the
various
forums
and
github,
and
today
I'm
going
to
be
talking
about
the
network
indexer,
which
is
an
important
component
of
a
content
routing
system.
So
let
me
share
my
screen
and
we'll
get
started
so
the
network
indexer.
What
is
a
network
indexer?
Well,
a
network
indexer
is
a
node
that
stores
mappings
of
cids
to
provider
data
records.
A
This
is
what
allows
us
to
find
find
where
content
can
actually
be
retrieved
from
so
and
think
of
an
indexer
like
a
very
specialized
key
value
store.
It
has
two
primary
groups
of
users:
that's
the
storage
providers
and
the
retrieval
clients.
A
Storage
providers
want
to
advertise
their
content
by
storing
data
in
the
indexer,
the
indexer
that
handles
us
with
ingest
logic.
Retrieval
clients
want
to
query
the
indexer
to
find
which
storage
providers
have
content
and
how
to
retrieve
that
content
from
the
storage
providers.
So
that's
part
of
the
indexer's
find
logic.
A
So,
let's
just
start
with
the
basics.
A
storage
deal
is
is
created
by
a
storage
client,
so
data
is
stored
on
a
storage
provider.
A
When
a
storage
provider
has
that
data,
it's
going
to
announce
that
it
has
new
content,
it
does
that
by
publishing
and
the
cid
of
a
special
record
called
an
advertisement
and
it
lets
the
indexers
know
that
it
has
this
this
new
content
to
be
to
be
indexed.
A
Usually
through
that
mainnet
nodes,
I
can
also
be
published
to
indexers
directly
via
http,
so
as
the
storage
writer
announces
it
that
gets
to
the
indexers
and
then
the
indexers
want
to
sync
that
new
content,
so
the
sync
portion
of
the
of
ingest
means
that
we're
going
to
go
ahead
and
read
the
all
of
these.
A
The
latest
advertisement
records
from
a
storage
provider
and
we're
going
to
get
the
information
that
we
want
to
index.
That
includes
a
context
id
metadata
and
all
the
multi
hashes
which
map
to
that
data.
So
we'll
talk
a
bit
more
about
what
the
ingest
process
involves
in
a
bit,
so
once
the
indexer
nodes,
ingest
the
data
and
now
they've
in
that
they've
actually
indexed
all
of
this
content
for
the
storage
provider.
A
A
client
can
then
query
the
indexer
to
find
where
that
content
is
and
how
to
get
it.
So
a
client
is
going
to
issue
a
query
for
a
cid
or
a
multi-hash,
and
the
indexer
is
going
to
look
up
the
provider
information
for
that
for
that
cid
and
it's
going
to
respond
with
one
or
more
provider
records
if
it
has
any
to
the
client
and
say
here
are
all
the
providers
that
provide
this.
This
content
and
information
about
how
to
go
retrieve
that
content
from
each
of
those.
A
It's
going
to
part
of
the
information
in
the
in
the
record
that
I
received
from
the
indexer
was
information
about
what
protocol
to
use
so
graph
sync
bit
swap
and
then
the
client's
going
to
send
that
provider
record
that
it
got
from
the
indexer
back
to
the
storage
provider,
which
allows
the
storage
provider
to
look
into
whatever
content
is
in
that
record,
to
use
it
to
find
the
data
that's
being
requested.
That's
maybe
a
like
a
deal
id
or
maybe
some
internal
record
keys
or
whatever,
whatever
it
may
be.
A
So
here
it
is
all
together
in
one
one
picture
just
to
see
all
the
different
interactions
that
that
are
happening
with
index
or
notes
all
right.
So
let's
talk
a
little
bit
more
about
the
ingest.
A
So
ingest
really
consists
of
two
parts:
the
publish
of
the
announcing
of
the
availability
of
more
content
to
index
and
then
the
sync
which
is
actually
where
the
indexer
is
pulling
in
that
content
and
creating
the
the
index
for
that
content.
A
So
the
first
part
is
the
publish
a
little
more
detail
on
that.
The
announce
message
is
what
gets
broadcast
out
from
the
publisher
to
the
indexer.
A
It's
usually
sent
over
gossip
pub
sub,
but
we
can
also
send
it
via
http.
This
is
already
built
into
lotus
clients
and
it
will
be
able
to
over
gossip
pub
sub,
send
this
to
the
mainnet
nodes,
which
then
relay
that
that
publication
to
the
indexers.
A
The
indexers
then
get
this
announcement
message
which
contains
the
cid
of
the
advertisement
that's
being
announced
and
along
with
the
publisher's
address,
which
is
where
to
retrieve
the
advertisement
record
from
and
that
allows
them
to
go.
Get
that
information.
Indexers
can
also
ignore
publications.
If
they
already
happen
to
have
the
advertisement,
they
may
have
synced
it
from
a
previous
from
another
publication
or
from
a
direct
announcement
or
or
from
any
number
any
number
of
different
ways.
They
may
have
been
notified,
so
additional
announcements
don't
cause
additional
work.
A
So
note
about
the
what
the
publisher
is.
We
say:
provider
and
publisher,
but
when
talking
about
indexing,
it's
important
to
realize
that
the
publisher
can
be
different
from
the
storage
provider.
So,
specifically,
the
publisher
is
the
entity
that
publishes
the
advertisement
records,
in
other
words
the
content
that
is
being
indexed
generally,
it's
the
same
as
the
storage
provider,
but
it
does
not
have
to
be.
A
Other
entities
can
publish
on
behalf
of
one
or
more
storage
providers
and
there's
policies
to
control
which
publishers
are
allowed
and
publishers
can
sign
advertisements
if
they're,
creating
they
may
create
those
advertisements
on
behalf
of
the
storage
provider
and
a
publisher
can
sign
those
advertisements.
A
We'll
talk
about
advertisement
signing
a
bit
more
anyway.
The
sync
process
works
like
this.
We
have
a
chain
of
of
advertisements
that
also
have
entries
associated
with
them.
This
forms
an
ipld
graph,
so
the
ingestion
reads:
the
chains
from
the
latest.
Most
recent
that's
been
announced
all
the
way
back
to
the
either
the
end
of
the
chain,
or
at
least
until
the
whatever
advertisement
the
indexer
has
already
ingested.
A
A
So
advertisements
are
signed,
including
the
links.
What
this
does
is
create
a
blockchain
like
structure
and
the
and
so
the
all
of
the
advertisements
and
all
of
their
their
associated
content.
Hashes
all
become
immutable
and
and
they're
all
signed
as
well.
So
we
can
verify.
We
have
a
a
chain
that
we
can
verify
every
every
portion
of
and
and
and
we
have
immutable
data,
then
we
can
make
sure
we
we
can
see
that
the
proper
signatures
are
applied.
A
So
I
wanted
to
talk
a
little
bit
about
context,
id
and
metadata,
which
were
mentioned
earlier.
So
a
context.
Id
is
what
uniquely
identifies
metadata
in
the
provider
record,
so
the
metadata
is
the
portion
that
says
how
to
get
the
the
content
and
and
then
some
opaque
data,
which
is
sent
back
to
the
storage
provider.
That
tells
a
storage
provider
where
to
look
it
up
like
a
deal
id
internal
record,
key,
etc.
A
So
the
context
id
is
what
a
provider
uses
to
be
able
to
update
any
of
its
metadata
or
delete
that
or
delete
its
records.
So
the
context
id
once
once
an
advert
has
been
published
with
the
context
id
and
we
can
publish
a
subsequent
advertisement
context
id
can
be
used
to
add
multi
hashes.
It
can
be
used
to
to
update
the
metadata
and
it
can
be
used
to
remove
all
all
of
the
data
associated
with
the
context
id.
A
A
So
let's
talk
a
little
more
about
how
indexer
stores
data,
so
indexing
content
means
well.
What
is
it
means
to
create
an
index,
and
that
means
taking
the
input
of
a
provider
record
and
it's
associated
all
the
associated
multi
hashes
that
are
part
of
that
data
and
inverting
this
to
be
able
to
have
multi
hashes,
which
then
map
back
to
that
provider
records.
In
other
words,
we
can
look
up
the
provider
record
by
using
that
multi-hash.
A
Different
providers
can
provide
the
same
data
or
maybe
a
provider
has
the
same
data
in
different
deals,
etc.
So
maybe
available
from
from
multiple
places.
When
when
we
need
do,
we
we
do
need
to
update
the
metadata.
So
how
does
new
mixer
refer
to
that?
We
don't
want
to
have
to
do
an
update
for
every
single
multihash.
That
would
be
millions
of
multihashes
potentially.
A
So
this
is
where
the
context
id
comes
in
the
context.
Id
is
then
used
to
look
up
the
or
delete
that
provider
record.
So
how
do
we?
So?
What's
that
actual
mapping
look
like
if
we,
if
we
need
to
refer
to
provider
records
both
by
metadata
and
provider
id
well,
we
get
basically
an
indexer
stores,
a
two-level
mapping.
So
we
have
any
multi-hash
that
maps
two
each
multihatch
maps
to
a
list
of
provider
keys
and
then
each
provider
key
maps
to
the
individual
provider
record.
A
And
by
doing
this
we
can
now
be
able
to
index
provider
records
by
multi-hash
or
by
the
context
id
and,
let's
talk
about,
say,
index
or
data
sharing,
so
indexers
are
able
to
share
data
with
each
other.
They
can
share
the
discovered
providers
and
publishers
or
they
can
discover
providers
and
publishers
from
other
indexers.
A
So
if
I
look
at
a
provider's,
I'm
sorry
an
indexer's
set
of
providers,
I
can
configure
any
other
indexer
to
go
and
retrieve
the
providers
from
it.
So
it
can
learn
about
them
and
then
be
able
to
exchange
indexing
information
with
those
providers
or
those
and
those
publishers.
A
A
So
what
are
the
the
next
steps
in
indexing
so
talk
about
how
it
works?
What
are
what?
What
are
we
going
to
be
providing
soon
we're
going
to
be
doing
advertisement
chain
snapshots,
which
is
basically
a
mechanism
of
compressing
advertisement
chains?
A
This
means
that
we
that
publishers
can
have
a
much
can
replace
their
chains
with
a
very
compressed
form
and
eventually
truncate
those
chains
so
that
they
don't
have
to
keep
the
data
all
the
way.
Back
to
the
beginning
of
their
very
existence.
A
A
So
when
I
say
multiple
nodes
generally,
this
is
going
to
be
within
a
single
deployment,
because
we'll
want
to
have
more
indexer
deployments
for
more
local
multiplies
content
routing.
But
we
still
want
to
be
able
to
scale
each
of
those
deployments
and
we
want
to
scale
them
by
spreading
a
work
across
multiple
nodes.
And
this
is
how,
as
a
these,
are
two
different
ways.
We
divide
up
the
work
by
ingestion
and
by
storage.
A
A
That's
the
portion
that
actually
stores
the
index
content
is
implemented
by
go
store,
the
hash
which
is
being
used
because
it's
a
highly
efficient
storage,
and
it
gives
us
a
lot
of
advantages
to
storing
huge
amounts
of
data
very
compactly,
although
we
can
replace
the
value
store
by
any
other,
any
other
store
that
implements
the
the
relatively
simple
index
or
core
interface,
so
we've
actually
used
pagreb.
I
think
we've
actually
tested
with
some
experiments
with
postgres,
but
we
can
hook
it
up
to
anything.
A
If
you
any
any
other
storage
that
you
want,
just
you
have
to
implement
the
index
the
interface,
and
so
we
have
also
our
another
repo.
That's
worth
mentioning
is
called
go
legs,
it's
it's.
What
provides
the
synchronization
logic
to
keep
indexers
in
sync
with
publishers
and
that's
it
for
the?
What
does
an
indexer
do?
How
does
it
work
at
a
somewhat
of
a
summary
level
and
I'd
like
to
ask
if
there's
any
questions
or
things
that
we
want
to
discuss
further
at
this
point,.
B
One
question:
can
you
talk
a
little
bit
about
the
scale
that
the
extruder
is
dealing
with
right
now,
like
how
many
records
what
fraction
of
the.
A
So
we
have
about
26
of
the
the
providers
that
we're
currently
indexing.
We
actually
have
stats
that
we're
keeping
on
our
dashboard
here,
and
so
we
can
point
to
some
of
these
so
about
26
of
the
of
the
providers
and
it's
about
23
of
the
deals
there's
other.
I
think
that's
going
to
be
discussed
in
other
presentations
in
more
detail.
A
So,
yes,
we
have
a
number
we'd
like
to
keep
adding
a
lot
more
providers
and
we'll
be
growing
the
number
of
providers
and
also
the
amount
of
data
that
each
provider
will
be
indexing.
Of
course,
so
we
are
still
ramping
up
and
you
can
actually
look
at
the
the
size
of
the
the
index.
This
is
this.
Is
the
current
size
we're
about
three
and
a
half
terabytes?
Let
me
expand
the
time
on
this
a
little
bit,
so
you
can
see
a
bit
more
linear
graph.
A
Let's
go
so
the
this
sawtooth
is
just
the
garbage
collection,
but
you
can
see
how
it's
been
growing.
Over
the
week,
we've
gone
from
around
3.2
to
around
3.5,
terabytes
and
so
third
to
half
of
a
terabyte
per
week
is
a
pretty
substantial
growth
rate.
But
we're
actually
still
running
this
on
a
single
node
and
not
having
any
problems
with
either
in
with
either
the
storage
or
the
speed
of
the
responses.
B
How
can
can
ipfs
large
pinning
services,
and
so
on,
connect
to
the
elixir.
A
B
C
Will
you
always
run
the
indexers,
or
do
you
foresee,
maybe
application
developers
who
run
the
ebook
servers
for
their
own
like
application,
for
example,
I
I'm
doing
something
big,
some
like
telegram
or
some
service,
and
I
might
run
my
only
mixer,
and
so
there
would
be
a
hierarchy
of
connections
or
something
do
you
do
like
plans
or
something
like
that.
A
But
yes,
there,
there
are
business
models
where
a
particular
entity
will
want
to
do
their
own
indexing,
so
it
maybe
for
internal
consumption,
or
maybe
they
have
a
specialized
client
to
look
up
data
in
within
their
services
and
so
they'll
they'll
index,
all
of
their
their
content,
or
maybe
maybe
they'll
index
the
content
of
certain
providers
that
they're
partnering
with
so
if,
for
example,
a
a
service
that
provides
some
sort
of
data
storage
and
retrieval,
maybe
they
partner
with
certain
filecoin
storage
providers,
and
so
they
index
the
content
that
they
they're,
that
they're
serving
out
and
they
that
may
be
something
that
they
keep
internally
but
there
as
far
as
a
hierarchy
of
indexers
that
actually
gets
into
a
lot
more
difficult
models
like
how
do
you
you
know
you
trust
them?
A
How
do
you
describe
how
you
just
determine,
which
one
has
which
portion
of
the
the
key
space
so
really
there's
at
this
point,
we're
looking
at
having
general
indexers
and
then
specific
indexers
that
are
for
specific
purposes
and
the
clients
that
use
those
specific
indexers
will
either
be
explicitly
configured
to
use
them
by?
Whoever
is
whoever's
services
is
providing
those
indexers.
A
Yes,
so
the
hashes
are
stored
efficiently
by
storing
only
the
the
portion
of
the
hash
that
has
the
the
the
the
longest
matching
prefix
of
any
other
hash.
So,
if
you
think
of
all
of
this
hash
data,
we
don't
actually
store
any
of
the
the
hash
data
other
than
the
part
other
than
the
minimum
necessary
to
differentiate
it
from
any
other
hash.
A
So
that's
that's
part
of
that's
the
main
part
of
the
solution
and
then
how
we
we
index,
those
records
on
on
disks
on
disks
and
things
like
that
allows
for
a
fairly
efficient,
find
and
look
up
of
those.
That's.
The
basic
idea
is
not
actually
storing
the
whole
hash.