►
Description
This talk was given at IPFS Camp 2022 in Lisbon, Portugal.
A
A
A
A
What
I
just
showed
you
is
the
soft
partitioning
mechanism
and
we
presented
a
paper
on
dimps
or
a
workshop
that
was
collocated
with
icdcs
this
year
and
if
you
want
to
see
the
paper
it's
online
on
that
link,
but
this
raised
a
question
so
which
property
should
we
use?
That
will
benefit
the
system
well.
Well,
we
want
to
minimize
latency
and
maximize
the
throughput
of
of
the
DHD,
and
our
hypothesis
was:
let's
use
the
geolocation,
so
we
thought
that
content
is
mostly
requested
in
the
same
geographical
area,
so
think
about
news
outlets,
for
example.
A
This
would
be
like
straightforward,
correct,
so
to
see
if
this
was
actually
the
case.
What
we
did
was
we
analyzed
the
ipf
for
ipfs
workload
and
what
we
wanted
to
know
is
what
and
where
is
content
being
requested
and
how
many,
and
where
are
the
providers
request
providing
the
requested
content?
And
for
this
we
analyze
the
content
requested
through
ipfs.
I
o
one
of
the
most
popular
ipfs
gateways
and
then
we
search
for
the
providers
of
the
requested
cids
on
the
ipfs
gateways.
A
I
mean
I'm,
sorry
on
the
ipf
sdhd,
okay.
So
to
get
the
requested
content,
we
had
two
weeks
of
logs
from
ipfs.io
from
the
seventh
to
the
to
the
22nd
of
March
of
this
year.
So
from
this
we
have
a
ton
of
HTTP
requests,
but
we
only
considered
get
HTTP
requests
that
that
had
a
status
code
of
200
and
300,
because
this
would
mean
that
the
Gateway
actually
managed
to
fetch
the
content
from
from
the
network
and
with
high
probability.
A
Let's
say
the
content
is
still
there
when
we
are
going
to
to
look
so
with
this.
This
filter,
like
mostly
half
of
the
of
the
requests
on
the
on
the
Gateway,
and
we
managed
to
get
four
million,
or
a
bit
more
than
four
million
different
cids
to
get
the
providers.
We
built
a
very
simple
gold
dptp
program
that
would
go
and
fetch
all
the
providers
of
a
given
Series,
so
not
just
the
20
providers.
That
would
be
normal
in
the
in
the
in
the
API
call
where
you
wanted
to
get
everyone.
A
Okay,
so,
unfortunately,
we
could
only
find
45
of
providers
for
all
of
the
cids,
so
out
of
the
total
CID
is
only
45
of
them.
We
found
providers
of
okay,
so
this
would
mean
that
we
found
55
000
different
providers
a
bit
more
than
that
and
unfortunately,
about
half
of
those
didn't
have
any
multi-adders
information.
So
we
could
not
extract
more
information
from
those
providers.
Okay
and
at
the
end
we
gathered
all
the
IP
addresses
that
we
got
from
both
requested
content
and
from
the
providers.
A
We
ran
it
through
Max,
Minds,
Julie,
3
databases
and
we
got
a
continent
geolocatal
decoding,
as
you
can
see
in
this
in
this
map.
Okay-
and
you
can
see
more
details-
if
you
want
about
this
mythology
in
our
notion
page
in
this
huge
huge
link-
okay,
let's
yeah,
okay.
So
let's
get
into
to
our
results,
so
I'm
going
to
try
to
answer
these
five
questions.
A
Okay
and
the
first
one
is
so.
How
many
requests
are
there
per
day
being
successfully
processed
by
the
Gateway
so
for
this
I
plotted
here?
The
the
requests
per
hour
over
our
our
time
of
our
observed
time?
You'll
see
here
that
on
day
14th
like
around
in
the
middle,
we
had
the
sudden
drop.
This
was
due
to
a
probably
a
failure
at
the
Gateway.
We
don't
have
logs
from
that
day
from
that
from
that
period,
but
what
we
see
is
that
requests
are
mostly
study,
so
ipfs
is
always
working.
A
If
you
break
this
down
by
continent,
we
see
that
I
mean
you
can
see
these
two
lines
on
on
the
middle,
the
the
orange
and
the
red
one
which
represents
North,
America
and
Asia
content
originated
from
Mathematica
and
Asia,
and
we
see
that
the
requests
are
mostly
divided
by
these
two
continent
groups,
while
the
rest
of
the
continents
have
very
little
expression
and
the
reason
for
this
I
mean
from
our
data,
it's
unknown,
because
you
can
have
many
reasons
for
this.
A
It
can
be
the
fact
that
requests
from
other
continents
are
just
not
pushed
to
that
Gateway
or
it
can
also
be
that
these
continents
requests
are
originated
from
these
continents.
Don't
do
any
requests
to
to
ipfs
so
for
this
I
I
mean
I
think
we
need
a
bit
more
data
and
investigation
for
this,
but
for
the
second
question,
so
what
is
the
popularity
distribution
of
CID?
A
So
how
many
times
is
a
CID
requested
on
this
Gateway
and
here
I
have
a
distribution
plot
where,
on
the
x-axis,
sorry
on
the
axis,
you
can
see
the
number
of
cids
a
different
number
of
cids,
while
on
the
y-axis
you
have
the
frequency
of
requests,
so
I
mean
to
understand
this
a
bit
better,
so
that
point
over
there
is
the
most
popular
CID
with
more
than
I,
think
100
000
requests
made
to
it.
This
represents
a
single
CID,
okay
and
that
point
over.
A
Okay,
so
this
would
be
like
you
could
see
this
as
a
zipf
distribution
of
the
of
the
system
of
the
workload
ones
of
the
system.
If
you
break
down
this
by
continent,
you
see
that
the
distribution
remains
in
almost
the
same
shape
that
we
saw
before,
and
a
fun
fact
is
that
this
is
kind
of
proportional
to
the
number
of
requests
being
made
from
each
continent.
A
Okay,
so
next,
so
how
many
providers
are
there
of
each
City?
I
mean
Dennis
already
touched
about
this.
We're
going
to
see
almost
the
same
thing.
I
guess
so
here
I
have
a
CDF
plot
of
of
the
I
guess
on
of
the
cids
on
the
y-axis.
We
have
the
number
of
providers
for
that.
Each
CID
has
on
the
wax
axis
this.
It's
labeled
replicas,
okay.
So
what?
How
is
what
we
see
here
is
that
a
bit
more
than
40
percent
of
all
cids
have
only
a
single
provider.
A
It's
something
that's
it's
very
interesting
is
that
the
the
request
or
I
mean
the
cids
that
are
provided
by
most
peers,
or
most
providers
are
not
actually
the
cids
that
were
most
requested
and,
in
fact
the
CIA
this,
in
fact,
the
cids
that
were
most
requested
only
had
a
thing
one
or
two
providers:
okay,
so
next,
how
many
cids
do?
Does
a
provider
provide
so
how
many
cids
yeah
it's
a
weird
phrasing?
It
doesn't
matter
okay,
so
here
again,
I
have
a
another
CDF
plot.
A
A
Very
few
providers,
as
as
Dennis
said,
and
if
we
look
if
you
break
down
this
by
by
continent,
what
we
see
is
that
these
providers
are
mostly
located
in
Europe
and
North
America
again,
as
you
saw
in
the
last
presentation,
Okay
so
last
question
and
really
what
we.
What
got
us
to
do
this
work?
So
is
there
actually
any
Geographic
locality
of
requests?
A
So
to
answer
this,
we
generated
this
this
hit
map
so
to
get
this
We
join.
We
combine
both
data
sets
from
the
requested
data
and
from
the
provided
data
on
the
on
the
CID
that
you
requested
on
I
mean
the
rows.
Are
the
origins
of
requests?
The
The
Columns
are
the
the
location
of
providers
and
to
read
this
as
an?
What
decides
is
that
that
cell?
A
There
means
that
60
or
I'm,
sorry
53
of
all
requests
that
were
originating
from
Europe
had
providers
in
North
America,
so
data
is
actually
normalized
by
the
number
by
the
total
number
of
requests
made
by
each
continent,
and
if
you
want
to
see
if
there
is
locality
here,
we
would
look
at
this
diagonal
and
if
there
was
any
locality
you
would
see
Hitler
so
it'd
be
more
you'd,
see
more
a
higher
percentage
of
requests
coming
from
there,
which
is
not
what
you
see
right
so
instead,
what
we
see
is
that
requests
are
mostly
concentrated
in
in
Europe
and
North
America,
which
are
the
most
significant
partition
of
providers.
A
So
the
most
providers
are
in
these
in
these
in
this
continents.
A
Okay,
so
what
does
this
mean
in
term
for
the
multi-level
DHD
designs
and
for
the
DHT
in
general?
So
what
we
saw
is
that
requests
that
content
is
provided
only
by
a
few
providers,
and
this
content
is
mostly
provided
by
peers
in
North,
America
and
Europe,
and
in
this
sense
we
saw
that
there
is
no
actual
geolocality
of
requests,
and
this
may
lead
to
load
balancing
issues
if
they
are
not
already
in
place,
because
you
have
only
a
few
very
few
providers
providing
all
the
the
work
in
the
in
the
DHC
in
the
network.
A
So
what
this
suggests
and
is
that
we
should
somehow
remove
the
load
on
highly
popular
providers
a
way
to
do
this
would
be
to
have
incentives
to
reprovise
content,
and
we
should
also
develop
a
new
strategies
to
perform
load
balancing
on
ADHD,
and
this
can
be
through
novel,
multi-level
DHT
designs
or
have
something
completely
different
from
a
DHT
at
all.
A
But
in
fact,
what
we
actually
need
to
do
is
to
to
be
sure
which
directions
we
should
take
is
that
we
need
to
continue
monitoring
to
understand.
How
does
the
system
evolve
and
how?
How
big
is
this
issue
in
ipfs,
okay,
and
we
actually
started
some
steps
towards
this
I
think
Janice.
In
the
beginning
also
already
mentioned
our
work
on
Telemetry,
which
one
will
also
have
discussed
this
a
bit
tomorrow
in
one
of
the
breakout
sessions,
so
so
have
a
look
into
it
and
that
sums
my
talk
I
would
like
to
again
thank.
A
We
have
actually
have
a
paper
submitted
with
these
results.
If
you
want
to
to
follow.
If
you
are
interested
in
this
work,
please
follow
the
discussion
on
notion
and
GitHub
a
lot
more
of
links
for
everything
and
if
you
are
interested
in
the
work
that
we
we
do
at
Nova
I
mean
be
sure
to
follow
us.
Thank
you.