►
Description
This talk was given at IPFS Camp 2022 in Lisbon, Portugal.
A
Today,
we're
going
to
be
discussing
more
of
what
some
of
the
team
has
been
doing.
Our
team
is
codenamed
problem
because
we're
doing
lots
of
probing
in
the
ipfs
and
Liberty
networks,
and
this
is
what
we
are
going
to
be
some
of
the
results
that
we
have
we're
going
to
be
presenting
today.
A
So,
let's
get
started.
What
we
want
to
do
is
actually
data-driven
protocol
design
and
optimization.
There
is
a
lot
that
someone
can
learn
when
we're
looking
into
the
actual
data
from
the
network
and
how
the
protocols
actually
perform
compared
to
their
specification
and
what
we
think
you
know,
programming
time
would
would
happen
in
the
real.
A
So
our
kind
of
motto
is
that
you
can't
really
improve
what
you
don't
measure,
because
you
don't
know
where
the
bottlenecks
and
you
should
measure
what
you
think
you
have
just
improved.
So
if
you
apply
an
optimization
to
a
protocol
and
you
push
it
out
to
the
network
and
the
network
you
know
starts
transferring
bytes
around
you,
you
cannot,
unless
you
have
the
right
measurement
and
monitoring
infrastructure.
You
can't
really
say
if
the
optimization
that
you
had
on
your
plan-
and
you
were
expecting
is
actually
the
optimization
that
is
seen
as
performance
in
the
network.
A
So
that's
what
we're
trying
to
do
now.
The
end
goal,
of
course,
of
doing
measurements
in
the
network
is
not
really
just
the
measurements
just
for
the
sake,
but
it's
it's.
So
it's
not
an
end
goal.
So
the
end
goal
is
basically
to
identify
the
problems
and
go
and
quantify
how
much
space
do
we
have
for
improvement
and
finally,
design
protocol
optimizations
and,
of
course,
as
with
every
other
protocol
in
the
ipfs
stack
and
ipfs
ecosystem,
which
is
open
source.
We
want
our
results
to
be
open
source
as
well.
A
A
We
get
the
results
and
then
we
have
to
do
some
analysis
to
see
if
they
are
what
we
expected
it
to
be,
if
intuitively
they
make
sense
and
if
they
don't.
Of
course,
we
need
to
revisit
step
two
or
five
if
they
do
expect
what
we,
if
they
do,
show
what
we
were
expecting,
then
all
good.
We
move
to
the
next
study,
so
the
methodologies
that
we
have
used
so
far.
A
There
are
three
complementary
methodologies
one
is
through
crawling
and
through
several
crawlers
that
exist
in
the
ibfs
network,
but
we've
written
or
enhanced
existing
ones
through
probes,
which
is
basically
controlled
nodes
in
the
network
and
through
logs.
A
So
the
continuous
network
monitoring
we
have
been
using
the
mighty
nebula
crawler,
the
author
of
of
which
is
right
here
and
we'll
be
talking
later
today,
it's
a
very
useful
crawler
that
we
kind
of
adapted.
You
know
the
features
that
the
Corolla
has
from
previous
ones
in
order
to
get
some
more
information
that
was
important
to
us
from
the
network,
so
the
network,
the
crawler,
has
been
running
since
last
summer.
It's
now
perhaps
above
10
000
crores,
but
if
not
it's
close
to
that
and
we
did
some
analyzes.
A
Some
of
the
results
I'm
going
to
present
are
from
October
of
20
2021,
which
is
a
blue
stripe
down
there.
Among
other
things,
what
we
have
found
is
that
we've
seen
more
than
200
000
peers.
More
than
two
million
addresses
about
half
a
million
IB
addresses
we've
seen
that
the
network
is
transferring
more
than
one
billion
requests
per
week.
It
should
say
there
and
we've
seen
the
ibfs
nodes
exist
in
more
than
150
countries
and
2
700
autonomous
systems,
which
kind
of
verifies
the
fact
that
ipfs
is
a
distributed
Network.
A
What
the
crawler,
though,
also
does
is
that
it's
monitoring
the
uptime-
and
that
was
important
for
us
because
we
didn't
want
just
want
to
see
how
many
peers
are
in
the
network,
but
how
much
churn
there
is
in
the
network,
so
the
journeys
when
I
know
these
joining
the
network
and
then
it's
leaving
at
some
point.
You
know,
what's
the
time
between
those
two
things
and
of
course,
then,
when
leaving,
when
do
they
come
back
again?
These
are
important
for
phpa
networks.
A
So
some
quiz
time
who
knows
what
is
the
peer
churn
in
the
ibfs
network
so
who
can
fill
in
this
Gap?
50
percent
of
peers
leave
the
ipfs
network.
After
how
many
hours.
A
Two
hours:
okay,
any
other
takers,
12
6,
not
12,
definitely
not
12.,
unfortunately,
laughs
half
an
hour,
okay
you're,
stretching
it
yeah,
but
you're
not
far
away,
fortunately
or
unfortunately,
is
about
one
hour.
So
those
two
are
the
closest.
Of
course.
This
varies
between
different
implementations
of
ipfs
and
different,
also
versions
of
of
ipfs,
but
yeah.
There
is
lots
of
churn
in
the
network,
at
least
from
what
we
have
observed.
A
So
the
cloud
dependency
ipfs
is
a
decentralized
network,
but
how
much
of
the?
How
many
of
the
nodes
run
on
centralized?
Cloud
infrastructure?
A
A
20,
okay,
getting
closer,
be
ready
to
be
surprised.
It's
about
three
percent
of
nodes
that
runs
on
centralized
Cloud
infrastructure
from
our
measurements,
which
we
double
checked
and
triple
checked
because
yeah,
it
was
a
little
bit
surprising
to
us
as
well,
but
it's
definitely
good
news.
If
you
ask
me
because
it
means
that
the
community
is
putting
up
own
infrastructure
to
host
ibfs
nodes,
which
is
great
final
one
to
do.
A
Now
it's
the
it's
a
representation
of
the
DHT
routing
tables.
Is
the
ipf
sdhd
some
a
little
bit
more
fancy
representations
but
a
representation
but
basically
yeah.
It's
bullets
that
okay,
it's
connected
to
others,
but
it's
not
shown
here
is
the
routing
table
representation
of
nodes
in
the
network,
which
looks
pretty
cool,
I
mean
there
are
some
things
that
we
also
cannot
really
understand
and
we
need
help.
But
that's
why
we're
here?
A
A
Cool
okay,
so
I'm
going
to
do
a
brief
overview
of
how
the
ipfs
system
works,
at
least
from
the
lenses
of
the
ibfsdhd,
because
this
is
going
to
influence
a
little
bit.
What
we're
going
to
talk
about
next.
So
what
is
happening
in
ibfs
when
you
want
to?
When
you
have
a
document,
then
you
want
to
share
it
with
others
through
ibfs.
Is
that
you
get
the
document
you
hash
it?
A
As
you
know,
you
produce
the
CID
and
then
what
you
do
is
you
create
a
small
file
that
includes
the
cad
and
your
own?
If
you're
the
provider
and
your
own
address
information,
the
peer
ID
and
the
IP
address,
and
you
put
that
in
the
small
in
a
small
file
and
you
send
it
to
the
ipf
sdhd
and
then
the
API,
the
apfs,
the
DHD
does
its
magic
and
it's
going
and
finding
one
node
in
the
network,
which,
let's
assume
assume,
is
that
one
now?
A
Next,
what
you
do
is
you
you
send
that
CID
to
your
friends
that
you
want
to
access
the
content,
so
they
have
the
CID
and
what
they
do
is
they
use
a
protocol
that
is
called
bit
Swap
and
they're.
Asking
they're
basically
immediately
connected
peers
to
ask
whether
you
know
they
happen
to
have
that
that
CID
right.
If
the
answer
is
positive,
all
good,
they
get
the
file
and
call
it
a
day.
A
What
these
know
the
yellow
node
does
is
that
it
sends
the
what
is
called
the
provider
record,
so
this
small
file
that
you
store
there
to
your
friends
now
that
point,
your
friend
has
got
the
CID,
of
course,
which
they
knew,
but
also
your
address
information,
the
peer
ID
and
the
multi-address,
the
the
IP
address.
Basically
so
the
establish
a
connection
and
they
get
the
file
so
that
that's
what
is
happening
at
the
high
level
now
it's
true
there's
two
points
worth
highlighting
here.
A
One
is
that
you
don't
upload
the
file
itself
to
the
DHD.
You
just
upload
the
provider
record
that
then
points
to
your
machine
and
others
can
come
and
take
it
from
you
at
least
it's
the
base
version,
if,
unless
we
add
other
bits
and
pieces
on
top
and
the
second
one
is,
as
you
know,
you
can
get
the
file
and
touch
it
again
and
then,
once
you
hash
it
you
can
you
will.
A
So
that's
it
at
the
high
level.
Now
there
are
some
opportunities
that
we've
seen
I'll
while
playing
around
with
you
know
and
understanding
all
the
different
steps
that
are
involved
and
we
figured
out
that
the
provide
process
so
the
very
first
step
there
when
you
want
to
to
basically
like
put
the
provide
the
record
to
the
DHD,
it's
very
slow.
It
takes
tens
of
seconds
in
some
cases
more
than
100
seconds.
So
the
hypothesis
there
was
that
there
is
some
bottleneck
in
the
ibfs.
A
The
HD
provide
process
which
we
do
want
to
prove.
What
is
that,
through
measurements?
We
have
found
that,
although
the
overall
process
takes
tens
and
hundreds
of
seconds
the
nodes
that
we
are
finding
so
that
red
node
there
and
all
the
rest
we're
not
finding
one
we're
finding
much
more
than
one
we're
finding
20.
all
those
nodes
are
being
found
within
mostly
less
than
half
a
second
right.
A
So
this
leads
us
to
think
that
must
be
some
some
wrong
doing
there,
because
you
know
it
takes
an
order
of
magnitude
more
to
basically
complete
the
process
when
we
could
be
as
fast
as
doing
it
in
less
than
one
second,
so
we're
working
on
this.
There
is
a
lot
of
documentation
and
presentations
on
that
and
it's
ongoing
work
on
how
to
to
improve
that.
A
What
I
want
to
say
by
that
is
that
you
know,
as
you
start
playing
with
the
network
and
through
measurements,
you
can
find
out
important
details
and
important
optimizations
about
the
project
about
them
about
ipfs
or
any
of
the
projects
that
you're
working
on
now.
The
second
one
is
I
look
up
late.
They
look
out
the
lookup
latency
I,
in
particular
the
the
DHT
lookup
latency.
A
So
the
hypothesis
there
is
that
if
we
break
down
the
content
routing
process,
which
is
composed
of
many
many
steps,
as
we've
seen
in
this
previous
slide,
then
we'll
identify
every
bottleneck
that
exists
and
that's
very
good,
because
you
know
applying
optimizations,
then
we
can
find
you
know
and
optimize
the
performance.
So
what
we
did
is
that
we
did
the
controlled
experiment.
A
We
spun
up
several
different
nodes
that
are
controlled
by
us
and
published
a
unique
CID
from
one
of
them
and
then
went
and
requested
that
from
all
of
the
rest
and
that
went
around
because
other
nodes
then
published
cids
and
then
the
rest
requested
that
so
we
repeated
that
several
times
and
finally,
there
were
more
than
3
000
cids
that
were
published
and
almost
15
000
cads
that
were
retrieved.
So
that's
the
kind
of
sample
size
that
we
are
talking
about
now.
A
What
we
found
out
is
that
around
80
of
requests
from
EU
from
an
EU
based
node
have
been
resolved,
or
at
least
through
the
DHT
part
of
the
procedure
that
I
described
in
in
less
than
500
milliseconds
right
and
50
of
all
the
requests
got
through
the
DHD,
the
DHT
part
of
the
resolution
process
in
less
than
one
second.
So
what
does
this
tell
us?
Where
is
the
opportunity
here?
A
If
you
see
the
middle
picture
is
just
the
DHT
walk
duration,
the
DHT
part
of
the
process-
and
this
is
what
I'm
talking
about
above.
But
if
you
see
that
that,
on
the
on
the
left
as
you're
looking
at
it,
then
you
see
that
everything
is
basically
shifted
by
about
one
second.
So
there
is
one
second
there
which
is
like
I,
don't
know,
100
percent
more
of
the
overall
time,
which
should
be
down
to
something
and
decreases
performance
significantly.
A
Now
this
is
the
step
on
bit
swap
so
that's
the
opportunity
here
the
bit
swap
process
that
we
said
in
the
beginning,
where
you
go
and
ask
all
your
immediately
connected
peers
might
take
a
lot
longer
and
unless
it
is
very
successful,
it's
going
to
delay
everything
by
100
of
the
time
it's
going
to
double
and
triple
the
time
that
you
need
as
a
normal
user
as
a
normal
client
to
to
resolve
content
from
the
DHD.
A
So
if
our
hypothesis
is
correct,
then
it
means
that
lots
of
we
can
see
a
great
Improvement
in
the
resolution
process.
So
this
is
an
ongoing
study
that
we
have
we're,
trying
to
figure
out
how
successful
is
bit
swap
if
it's
a
lot
or
not,
and
what
does
this
mean
for
the
average
user
so
yeah?
These
are
two
of
the
studies
that
we
did
based
on
measurements
again.
What
I
want
to
highlight
is
that,
as
you
dig
more,
you
find
more
optimizations.
A
A
How
well
instrumented
is
every
routing
table
so
that
it
can
point
to
the
right
other
nodes
if
you
ask
them
for
something
for
a
CID
or
a
PID
in
the
network,
great
study,
very
detailed
report.
There
you'll
understand
how
the
DHC
Works,
which
in
great
detail,
which
is
not
very
easy,
then
provide
the
record
liveness
again
we're
going
to
have
a
talk
in
a
little
bit
about
that.
A
The
research
hypothesis
there
was
that,
if
provide
the
records
do
not
stay
alive,
then
the
content
that
is
published
in
the
network
is
not
reachable,
which
of
course,
is
terrible
right.
If
you,
if
you
have
a
storage
and
retrieval
Network
like
ipfs
and
you
publish
content
and
then
suddenly
you
cannot
find
it.
Then
you
know
it's
not
great
news,
so
great
results
there
very,
very
encouraging
yeah.
So
did
I
have
a
third
one,
I
think
I
had
the
third
one
yeah
I
I
skipped
it
that's!
A
Okay,
so
third,
one
so
yeah
that
pretty
much
concludes
what
I
wanted
to
say.
As
in
you
know
what
we
have
been
doing
roughly.
There
are
many
more
studies,
so
you
can
see
what
we
are
the
result.
Some
of
the
results
that
I
mentioned,
but
also
many
more
in
this
URL-
starts
to
ipfs.network
yeah.
There
are
weekly
reports
there
very
detailed,
very
interesting
talk
about
the
geolocation
of
users,
rotating
peer
IDs
and
the
churn
of
the
network,
of
course,
and
a
lot
more
so
go
check
it
out.
A
You
can
read
pretty
much
yeah
all
of
what
I
said
in
this
recent
paper
that
we
have.
You
can
find
it
online,
of
course,
I'm
going
to
share
the
slides.
This
is
the
CID
that
is
on
ipfs
network,
so
you
can
find
it
through
that
yeah.
So
you
can
get
involved.
You
can
find
lots
of
what
we're
doing
in
our
notion
page,
which
is
also
linked
from
here.
We
have
funding
available
through
the
radius.space
platform.
A
You
can
yeah
go
there
and
apply.
You
can
follow
most
of
what
we're
doing
in
the
GitHub
repository
Network
Dash
measurements,
which
is
where
we
put
our
reports,
which
is
where
we
put
the
requests
for
measurements
as
we
call
them,
and
there
are
many
many
that
are
open.
Of
course,
you
can
go
and
work
on
some
of
them
or
you
can
add
your
own
ideas
right.
So
we
are
looking
for
you
know
things
like.
Why
are
these
lines
there?
Black?
Okay,
let's
figure
that
out
yeah.
A
You
can
also
find
us
on
on
the
on
ipfs
Discord
under
hashtag
probe
lab.
That's
where
the
team
is
mostly
chatting
so
yeah
join
that
in
the
problem
notion
page.
You
can
also
see
the
board
of
the
projects
that
we
have
with
our
current
projects
that
are
in
progress,
those
that
are
done,
the
next
ones
that
we're
going
to
work
on
and
so
on.
A
So
you
can
use
some
of
the
tools
we're
developing
to
do
your
own
research.
So
one
of
the
teams
that
he's
present
here
right
now
they
did
this
Telemetry.
They
developed
this
Telemetry
tool
that
you
can
put
on
your
ipfs
node.
If
you're
running
one
you
should
and
then
you
can
start
getting
statistics
out
of
this
node,
which
is
very
useful,
I
think
there
are.
There
is
documentation
on
how
to
set
up
a
flashy
grafana
dashboards
as
well.
But
that's
that's
the
easiest
part
the.
How
to
do.
The
Telemetry
is
the
most
important
one.
A
What
else
yeah
and
we'll
ask
you
to
get
involved
in
another
study
that
is
going
to
be
primarily
presented
on
Sunday
in
the
lip
P2P
day,
so
the
Olympic
team,
as
you
might
know,
is,
has
developed
a
nut
whole
punching
approach,
probably
between
not
hole
punching,
is
yeah
one
big
problem
that
has
not
been
solved
in
peer-to-peer
networks,
so
the
Liberty
team
now
does
have
a
solution
and
we're
going
to
be
running
a
study
where
we're
going
to
ask
users
anyone
in
the
community
to
download
a
binary
and
run
it,
and
then
we're
not
going
to
get
any
of
your
personal
information,
of
course.
A
But
we
want
to
see
what
this
is
going
to
do
is
going
to
instrument
that
node,
like
measurements
between
your
node
and
some
of
the
nodes
that
are,
our
lobby
is
running
to
see
if
there
can
be
natural
punching
through
your
home
network
right,
which
is
great.
It's
it's
really
going
to
improve
performance
a
lot.
If
we
manage
to
get
that
right,
so
we're
doing
the
measurement
study
for
the
lipitim,
so
the
experiment
is
going
to
run
later
on
in
December.
A
Of
course,
we're
going
to
make
the
results
publicly
available
afterwards,
so
you're
going
to
be
able
to
use
them,
yeah,
analyze
them,
publish
your
own
papers
or
publish
your
own
blog
posts
and
reports.
A
Sorry,
it's
too
far
away:
okay,
yeah
I'll,
send
the
slides,
I'll
I'll
share
the
slides
in
them
in
the
slack
channel
of
this
particular
track,
so
that
you
can
do
it
from
there
right
now:
cool
yeah.
So
that's
it
from
my
side.
I'm
very
excited
about
the
rest
of
the
program,
we're
going
to
talk
about
some
things
I
touched
upon,
but
some
others
that
I
have
not
talked
about
so
yeah.
Let's
welcome
our
next
speakers.