►
From YouTube: Measuring the Web3.0 Stack
Description
Permissionless networks are challenging to design and operate. Developing methodologies to get insights from the performance of nodes in the network is an essential operational procedure. In this talk we are going to go through the steps we have taken so far to measure IPFS, and the Web3.0 stack more in general, and the results we have gathered. We are also going to point to directions we plan to pursue in the near future and invite the community to get involved!
A
Hi
everyone
very
nice
to
be
here
thanks
for
the
great
intro
bailey,
so
indeed
my
last
name
is
a
little
bit
unpronounceable,
so
my
name
is
janis
pasaras
and
I'm
a
research
scientist
at
protocol
labs
and
I'm
going
today,
I'm
going
to
be
talking
to
you
about
measuring
the
web3
stack.
A
So
when
it
all
started
was
a
great
workshop
that
we
organized
back
in
june,
which
was
called
the
idf
or
diff
more
simply,
and
it
focused
on
decentralizing
the
internet
with
ipfs
and
filecoin.
We
had
great
great
sessions
with
top
researchers
and
scientists
from
around
the
world.
A
A
You
can
find
that
link
to
the
github
repository
out
there,
and
that
was
a
crawling
tool,
so
we
started
doing
something
that
was
long
overdue,
so
crawling
the
network
more
systematically
in
order
to
find
out
what's
going
on
actually
inside
the
nuts
and
bolts
of
ipfs,
the
crawling
tool
runs
every
30
minutes
roughly
and
what
it
does
is.
A
Just
after
crawling
the
network
basically-
and
that
could
have
been
due
to
a
number
of
reasons
it
could
have
been
because
we
had
very
unstable
nodes
because
we
nodes
would
rotate
their
multi
addresses
or
hydras
were
having
some
problems
and,
as
you
know,
the
hydro
boosters
are
prevalent
around
the
ipfs
dhd
network.
A
We've
also
seen
that
many
peers
rotated
their
multi
addresses
to
the
point
of
having
a
different
multi
address,
coming
from
the
same
id
address
more
than
5
000
times
within
a
space
of
one
week.
So,
as
I
said,
we
did
something
that
was
long
overdue.
We
had
to
design
a
more
systematic
approach
to
do
measurements
on
the
web3
stack,
so
that
includes
ipfs.
It
includes
file
code
and
it
might
include
other
networks
in
the
in
the
future.
A
What
we
now
have
is
tools
that
can
go
on
and
do
continuous
measurements
on
the
pairs
of
the
network
in
a
fully
transparent
in
a
fully
transparent
way,
and
we
can
have.
We
do
have
an
architecture
that
is
basically
built
sorry
split
in
three
different
parts.
A
So
our
crawler
nebula
in
this
case
is
dockerized
now
and
can
run
in
several
different
places,
several
different
points
in
the
in
the
globe.
A
So
we
decided
to
take
two
main
directions:
measure
the
churn
rate
of
the
ibfs
network,
which
is
something
that
was
not
very
clear
before
there
were
measurements,
and
you
know
statements
you
find
online,
but
nothing
to
say
you
know
how
much
that
is
actually
and
that's
something
very
important,
because
it
drives
the
design
and
the
protocol
setting
for
the
dht
and
not
only
it.
A
It
goes
into
defining
other
parameters
of
the
network,
and
we
also
decided
to
measure
the
latency
of
the
whole
cycle
from
the
content
publishing
in
the
ipfs
network
to
retrieving
content
as
a
client
and,
of
course,
we've
got
several
future
directions,
we're
already
working
on
them.
This
is
just
a
sneak
peek
of
what
we're
doing
right
now.
The
results
I'm
going
to
present
we're
soon
going
to
be
making
all
the
public
in
a
nice
way
online
in
reports,
but
also
separate
websites
and
we're
also
going
to.
A
We
have
started
work
on
extending
our
studies
to
the
filecoin
network,
the
file
coin,
the
the
lotus
dhd
and
so
on,
the
retrieval
network
of
file
coin
and
so
on.
So
there
is
lots
more
to
come
to
what
I'm
going
to
present
in
the
next
few
minutes,
so
starting
from
the
rate
in
the
ipfs
network,
our
results
show
that
the
turn
rate
in
ipfs
is
quite
quite
big.
A
So
we
see
that
around
like,
if
you
dig
into
this
prof
you're,
going
to
see
that
around
60
percent
of
the
dht
server
appears
stay
online
for
one
at
one
and
a
half
hours
or
less.
And
similarly,
if
you
go
a
little
bit
further
down,
you
see
that
80
percent
of
the
dht
server
appears
stay
for
stay
online
for
three
hours
or
less.
So
we
consider
these
to
be
quite
high
as
a
churn
rate.
A
It's
almost
as
high
as
we've
seen
big
torrents,
the
bittorrent
dht
reports
from
the
bittorrent
dht
about
20
years
ago
and
on
the
plat
on
the
bright
side.
We
realized
that
the
settings
that
we
have
for
the
ipfsdhc
are
very
correctly
set
and
they
manage
to
to
provide
lots
of
resilience
to
the
network.
This
is
results
about
the
resilience
that
I'm
not
going
to
talk
about
today,
but
we
are
going
to
publish
in
the
near
future.
A
So
we
wanted
to
to
understand
generating
the
ipfs
network
a
little
bit
more,
so
we
decided
to
see
what
percentage
of
nodes
is
stable
and
what
percentage
is
those
are
kind
of
coming
and
going?
And
we
also
wanted
to
see
how
often
nodes
go
offline,
so
a
node
is
coming
online.
For
how
long
does
it
stay
before?
We
can
assume
that
it
can.
It
might
go
offline
again
and
therefore
not
be
reachable.
A
So
we
have
split
the
overall
node
by
running.
Like
that's
one
of
the
experiments,
as
you
see
here,
it
runs
from
the
it
runs
for
about
two
and
a
half
days,
a
little
bit
less
than
that
and
from
this
experiment
we
can.
We
found
out
that
about
14
of
nodes
are
always
online
and
they're
very
stable.
There
is
a
very
tiny
percentage
that
we're
seeing
in
the
initial
crawl
of
the
network,
but
we're
never
seen
online
again.
A
A
So
we
went
on
to
ask
the
question
for
how
long
do
nodes
go
offline
and
for
how
long
do
they
stay
online?
So
in
this
graph
you
can
see
the
node
counts
on
the
y-axis
and
reliability
on
the
x-axis.
So
let
me
explain
for
a
minute
what
this
x-axis
means.
We
can
go
here
and
see
the
amount
of
time
that
this
run.
This
experiment
was
running
for
which
is
about
50
53
hours
or
something
basically
translates
to
about
3200
minutes.
A
So
if
we
go
to
this
first
spike
there,
which
is
just
above
2000
nodes,
we
see
that
those
nodes
stay
in
the
network
for
one
percent
of
the
time.
One
percent
of
the
time,
given
the
experiment,
duration,
is
about
half
an
hour
and
that's
a
very
large
percent
percentage.
We
can
see
that
other
nodes
go
from
then
on.
A
Obviously,
the
node
counts
go
down,
and
so,
for
example,
we
see
that
here
we've
got
about
400
nodes
in
the
20
online
time
mark,
which
means
it's
about
10
hours,
so
about
20
of
the
nodes
in
the
ipfs
dhd
stay
in
the
network
for
10
hours
or
less,
at
which
point
they
go
offline
and
then
might
come
online
later
on.
A
So
we
started
thinking,
so
we
need
to
dive
deeper.
What
is
what
is
causing
this,
and
we
ask
the
following
questions
that
we
you
see
in
this
slide:
whether
nodes
are
running
on
unreliable
home
machines,
whether
nodes
that
run
on
those
home
machines
turn
off
at
night,
which
is
something
very
normal
to
do
for
a
normal
user,
and
we
have
some
questions
that
we're
still
getting
results
for
whether
nodes
are
rotating
their
their
peer
ids,
which
could
be
intentionally
or
unintentionally
for
other
reasons
such
as
bugs
and
so
on.
A
So
we
went
and
we
measured
those
three
different
types
of
nodes,
the
all
nodes
that
you
know
basically
includes
both
the
always
own
nodes
and
the
dangling
nodes
and
we're
trying
to
see
what
infrastructure
they're
running
on.
We
found
out
that
there
is
here.
The
blue
slice
is
10
of
nodes
that
run
on
digital
ocean,
about
3
that
run
on
aws
and
a
tiny
percentage
that
runs
on
azure,
and
there
is
a
very
big
percentage.
80.85.8.
A
Which
is
unknown,
so
this
is
home
machines
of
users,
or
it
could
be
cloud
environments
that
actually
have
not
made
their
iep
addresses
public.
A
So
we
don't
really
know
if
it's
cloud
infrastructure
or
not,
then
we
split
those
two
always-on
nodes
and
dangling
nodes,
and
we
see
that
those
nodes
are
always
online
and
they,
the
percentage
of
them
that
run
on
cloud
infrastructure,
is
a
little
bit
larger
than
the
one
that
I
just
presented
above,
and
this
is
normal
when,
when
nodes
run
on
cloud
infrastructure,
it's
much
more
likely
that
they
stay
online
for
longer
still,
however,
there
is
a
70
73
of
nodes
that
run
on
mostly
on
home
machines
for
dangling
nodes
on
the
right
hand,
side.
A
So
this
gives
us
a
very
good
footing
to
say
you
know
at
least
that
you
know
the
turn
rate
can
be,
can
be
attributed
to
unreliable
home
machines,
although
we
still
have
go
experiments
on
going
to
find
out
more.
There
is
a
great
result
out
of
this
to
say
that
ipfs,
infrastructure,
decentralized
storage
and
delivery
network
does
not,
for
the
most
part,
run
on
on
centralized
cloud
infrastructure,
which
is
enabling
the
fact
that
you
know
there
is
no
single
point
of
failure,
which
is
the
vision
for
all
these
technologies.
A
On
the
left
hand,
side,
this
graph
is
showing
the
uptime
when
the
uptime
sorry
the
the
points
in
time
when
nodes
go
offline
and
this
is
related
to
daytime
and
on
the
on
the
right
hand,
side,
we
can
see
nodes
that
go
offline
during
nighttime.
A
We
see
that
these
two
look
pretty
similar
and
based
on
the
correlation
pattern
that
we
have
picked.
We
we
conclude
that
actually
it
is
not
the
case
that
some
peers,
the
peers,
go
offline
during
night
time.
That
was
for
a
specific
location.
Obviously-
and
in
this
case
it
was
for
hong
kong-
no
notes
that
were
based
in
hong
kong,
because
you
we
obviously
have
to
take
into
account
the
the
time
of
day.
A
So
we
need
to
differentiate
between
different
time
zones
now
going
on
quickly
to
the
latency
of
the
whole
cycle,
from
publishing
content
to
retrieving
content
in
ipfs.
We
wanted
to
break
this
down
into
several
different
steps
and
obviously
the
first
one.
Is
the
content
publish
time
so
in
this
graph
we
can
see
the
distribution
of
content
publishing.
So
the
count
the
y-axis
here
is.
A
These
results
are
literally
been
in
the
oven
right
now
and
we
are
going
to
be
releasing
them
pretty
soon,
but
that's
to
give
you
an
idea
of
the
level
of
detail
that
we're
going
into
now
going
to
retrieval
latency.
We
find
out
that
we
have
the
dht
walk
that
is
depicted
in
this
picture
and
again
on
the
y-axis.
We
have
the
counts,
as
in
the
number
of
items
that
we
have
tried
to
retrieve
from
the
network
and
on
the
x-axis,
we
have
the
latency
and
we
need
to
carry
out
more
of
those
experiments.
A
But
for
now
it
looks
like
at
least
on
the
retrieval
side.
Things
are
are
being
improved
a
lot
compared
to
the
state
of
things
a
few
months
ago,
and
this
is
a
huge
win
for
the
developers,
both
pl
supported
and
the
open
source
community.
Of
course
the
is
providing
all
those
that
is
doing
all
those
improvements
to
the
protocol
stack.
A
We
have
also
tried
to
find
out
some
interesting
some
interesting
insights,
going
into
further
detail
and
for
the
provider
records
where
we
see
that
the
put
operation
is
failing,
we
wanted
to
figure
out.
Why
is
it
failing
and
what
are
the
agent
versions
that
are
causing
these?
So
we
see
in
this
in
this
graph
here
that
out
of
about
1300
records,
put
attempts.
A
These
are
the
agent
versions
of
which
which
have
have
failed.
We
see
it's
about
36
percent
on
go
ipfs
nodes,
31
in
hydro,
booster,
nodes
and
so
on.
So
this
gives
us
a
very
good
insight
into
going
and
debugging
what
is
going
on
there,
which
is
a
very
important
thing
to
do
if
we
want
to,
you
know,
improve
the
performance
of
the
network
and
and
make
it
brighter
and
better.
A
So,
with
this
I'm
going
to
finish
here,
I
would
like
to
mention
that
we
have
lots
more
coming
in
very
shortly,
so
follow
up,
get
in
touch
and
we're
more
than
happy
to
collaborate
with
more
people
from
the
community
to
dig
deeper,
find
more
insights
and
actually
try
to
build
more
robust
protocols,
high
performance
protocols
and
resilient
protocols.