►
From YouTube: The power of two choices - Petar Maymounkov
Description
This talk was given at IPFS Camp 2022 in Lisbon, Portugal.
A
Cool
well,
first
of
all,
I'm
super
happy
that
I
can
join
if
only
remotely
this
year
and
thank
you
for
sticking
around
for
the
probably
latest
section
talking
in
the
day
for
you.
A
A
A
So
first
I
want
to
start
by
telling
you
more
about
what
the
distribution
of
the
current
DHT
looks
like,
and
so
for
the
purposes
of
this,
we
can
simply
just
assume
that
peer
IDs
are
just
random
numbers
chosen
between
0
and
1..
So
these
are
random
real
numbers.
A
The
the
engineering
details
of
whether
there
are
hashes
of
private
keys
are
sort
of
irrelevant
because
we're
just
focusing
on
on
the
distribution
of
these
numbers.
So
let
me
start
by
just
sort
of
simply
asking
the
following
question:
you
know
what
happens
if
you
pick
a
large
number
of
random
numbers
in
the
interval
between
0
and
1..
A
So,
for
instance,
let's
pretend
that
we're
going
to
pick
a
thousand
numbers
in
the
interval
between
0
and
1,
and
this
box
I
have
on
the
screen
is
sort
of
a
depiction
of
the
interval
between
0
and
1,
and
if
you
pick
a
thousand
numbers
you're
going
to
see
a
picture,
that
kind
of
looks
like
this,
so
the
vertical
lines
are
the
the
random
numbers
and
you
can
see
that
they
are
roughly
uniformly
distributed
visually
kind
of
like
at
the
larger
scale
of
the
interval.
A
But
at
the
same
time,
if
you
look
closely
at
sort
of
individual
windows
in
this
interval,
there's
parser
windows
and
there's
denser
windows,
and
so
we
want
to
understand
up.
You
know
why
is
it
that
when
we
pick
uniform
numbers,
we
end
up
with
a
distribution
realized
distribution
which
is
not
really
uniform
and
how
non-uniform
can
it
be?
A
You
know
this
intuitive
notion
of
some
parts
of
the
interval
or
denser
and
other
parts
are
sparser,
and
so
the
way
we
do
this-
and
this
is
probably
familiar
to
most
of
you-
is
we
take
all
these
numbers
between
0
and
1
and
we
insert
them
in
a
binary.
Try
now
I
have
to
assume
that
most
of
you
roughly
know
what
a
binary
try
is
so
I'm
just
going
to
very
briefly
sort
of
gloss
over
the
definition.
Essentially,
the
idea
is
that
you
know.
A
If
a
number
is
on
the
left
side
of
the
interval,
then
it
would
go
in
the
left
subtree.
If
it's
on
the
right
side,
it
would
go
in
the
right
subtree
and
then
you
keep
doing
this
recursively
and
you
stop
as
soon
as
every
number
basically
lives
on
a
leaf
on
its
own.
So
the
try
doesn't
grow
past
the
point
where
every
leaf
is
like
in
its
own.
Every
number
is
in
its
own
leaf.
A
So
in
this
case
here
this
try
is
what
you
would
get
for
the
numbers
that
we
have
on
the
illustration
here
and
you
can
see
now
that
numbers
which
are
in
sparse
regions
of
the
interval
end
up
having
much
shallower
leaves
in
the
binary
time,
whereas
numbers
which
are
in
denser
regions
are
deeper
in
the
tree.
A
I
won't
like
go
into
the
details
of
why,
because
you
can
figure
it
out
kind
of
on
your
own,
just
by
kind
of
like
studying
it
a
little
bit,
but
the
key
Point
here
is
that
the
the
depth
of
how
far
a
number
ends
so
the
depth,
the
the
depth
of
how
far
a
number
ends
up
in
the
time
represents
the
density
of
the
interval
around
it.
The
depth
in
the
tree
of
a
number
is
a
quantification
of
how
dense
how
dense
the
region
around
this
node
is.
A
So
here
you
have
the
example,
the
red
numbers
we
if
we
want
to
summarize
the
kind
of
the
non-uniformity
of
the
entire
outcome,
so
the
whole
interval
with
all
the
numbers.
The
standard
way
to
do.
This
is
to
just
look
at
the
distribution
of
these
depths
and
so
you're
going
to
see
a
picture
like
this,
which
basically
looks
roughly
like
a
bell
curve,
and
here
comes
like
kind
of
like
the
first
important
fact
that
it's
somewhat
non-obvious
that
that
I
should
tell
you.
A
The
fact
is
that
if
you
do
this
experiments
any
number
of
times
that
you
want
the
experiment
being
picking
a
you
know,
picking
a
thousand
numbers
and
and
and
putting
them
in
a
binary
time,
even
though
every
single
time
you're
going
to
see
a
different
try,
because
the
numbers
will
be
different
every
single
time.
The
distribution
of
the
depths
of
the
leaves
is
always
going
to
be
the
same.
So
this
is
a
sort
of
a
magical
magical
fact
that
one
can
prove
analytically
and
it's
quite
important.
A
A
You
know
about
20
000
numbers,
which
is
the
size
of
the
DHT
Network
you
can
you
don't
need
to
know
the
size
exactly?
You
can
just
roughly
generate
this.
Many
random
numbers
and
the
distribution
that
you
see
in
your
experiment
is
going
to
exactly
match
the
distribution
of
the
of
the
real
of
the
real
thing.
A
So
the
the
analytical
results
that
we
know
this
is
like
a
textbook
result
in
math
of
what
happens.
What
this
distribution
looks,
like
is
even
more
precise
than
just
saying
that
distribution
is
going
to
be
the
same
every
time
and
in
particular
the
result
says
that
the
mean
of
this
distribution
is
going
to
be
roughly
log
n
and
by
roughly
I
mean
there
is
a
constant
factor
in
front
of
log
n,
which
I'm
not
saying
what
it
is.
A
Usually
theorems
don't
give
you
the
exact
number,
but
you
can
always
find
it
yourself,
just
experimentally,
so
the
mean
is
roughly
login
and
any,
and
it
will
be
something
like
two
or
three
times:
log
n.
So
it's
like
much
deeper
than
if
the
tree
was
balanced.
If
the
tree
was
balanced,
the
mean
would
be
exactly
log
n.
A
A
So
the
to
interpret
this
the
the
statement,
it
says
that
the
variance
so
it
says
that
their
leaves
that
whose
depth
is
either
larger
or
smaller
than
the
mean
by
as
much
as
the
mean
itself.
This
is
just
another
way
of
saying
that
the
tree
is
really
really
imbalanced,
and
this
is
in
fact
very
easy
to
check
if
you
open
up
the
DHT
code
and
look
at
the
routing
table
of
an
individual
node,
you're
gonna
find
out
that,
like
oftentimes
nodes,
have
a
routing
table
of
depth,
15
or
16..
A
A
So
the
the
lesson
here
is
that
the
DHT
distribution,
predictably,
is
extremely
volatile
and
the
good
news
is
that
you,
don't
you,
don't
have
to
sort
of
know
these
theorems
or
like
find
them
in
the
literature,
because
you
can
always
just
simulate
simulate
these
experiments
in
a
computer
program
and
and
figure
out
exactly
what
the
distribution
would
be
and
to
facilitate.
This
I've
created
two
libraries
that
we
use
quite
a
bit
in
the
DHT
part
of
ipfs.
A
So.
This
is
sort
of
concludes
part
one
which
is
really
just
telling
you
that,
unsurprisingly,
the
distribution
of
peer
IDs
in
the
DHT
is
quite
imbalanced.
A
Now
part
two
I
want
to
tell
you
about
a
different
way
of
picking
your
peer
ID,
which
would
result
in
a
significantly
better
distribution.
So
this
this
way
is
called
the
the
power
of
two
choices.
This
algorithm
for
picking
up
here,
ID
and
the
algorithm
simply
says,
pick
pick
two
numbers.
A
If
you're
choosing
your
peer
ID
pick,
two
random
numbers
put
both
of
them
in
the
binary,
try
and
then
stick
with
the
one
that
ends
up
in
a
shallower
location,
in
other
words,
in
a
less
dense
region
of
the
distribution
and
throw
away
the
one
that
ended
up
in
deeper
location.
In
this
picture,
the
orange
one
ends
up
much
deeper.
A
A
Both
visually
in
the
in
the
interval
0
to
1,
but
also
the
try
that
corresponds
to
to
this
distribution
would
be
nearly
perfectly
a
balanced
3
with
some
defects
here
and
there
so
fairly
infrequent.
A
And
if
you
want
to
represent
this
statement
in
a
distributional
form,
you
know
you
can
see
the
picture
on
the
right.
The
distribution
would
be
something
that's
very
skinny
and
focused
around
the
mean,
and
it
would
have
very
small,
Tails
kind
of
coming
coming
off
to
the
side.
Now,
the
the
actual
formal
statement
of
the
theorem
says
that
if
you
use
the
power
of
two
choices,
algorithm
the
mean
of
the
binary
T
would
be
now.
A
It
would
be
actually
pretty
much
exactly
log
n,
but
the
more
interesting
Parts
is
that
the
deviation
from
the
mean
is
a
log
of
log
of
n.
So
previously
it
was
log
of
n.
Now
it's
log
of
log
of
n,
so
this
is
exponentially.
Smaller
log
of
log
of
n
is
usually
such
a
small
number
that
you
can
think
of
it
as
just
being
equal
to
one
or
two,
regardless
of
how
big,
how
many
numbers
you,
you
know
peer,
IDs,
your
system,
sort
of
has
so
in
practice.
A
What
this
means
is
that
you,
you
expect
to
see
a
completely
balanced
tree
which
very
infrequently
might
have
nodes
that
are
either
one
deeper
than
than
the
average
depth,
or
maybe
one
shallower
than
the
average
depth,
and
so
this
is
called
yeah,
like
I,
said
the
power
of
two
choices.
Now
the
question
is:
why
might
you
care
to
produce
peer
IDs
that
are
uniformly
distributed
versus
ones
that
are
not
so
uniformly
distributed,
so
in
general,
there
are.
A
Multiple
reasons
depends
essentially
what
you're
trying
to
accomplish,
but
I'll
give
you
two
examples
just
to
give
you
an
intuition
of
how
you
might
use
this
more
balance.
Three
one
question
that
comes
up
a
lot
in
the
DHT
world
is:
how
can
we
roughly
estimate
the
size
of
the
DHT
Network
and
a
natural
way
to
estimate
the
size
of
of
the
network
is
to
pick
a
random
peer
from
the
network
and
see
how
deep
they
are
in
the
binary
try?
A
So
you
can
see
that
if
you're
three,
if
the
binary
trial
is
balanced-
and
you
just
pick
a
random
number
from
the
from
the
network
from
the
random
peer
from
the
from
the
network,
it
is
very
likely
that
the
depth
of
the
sphere
is
going
to
be
equal
to
the
mean,
which
is
exactly
log
n,
and
this
means
that
you
can
infer
the
size
of
the
network
from
from
this
number.
A
It
would
just
be
2
to
the
power
of
the
of
the
depth
of
the
nodes
that
you,
sampled
and,
and
your
estimation
of
the
size
of
the
network,
would
be
correct
most
of
the
time
so
90
of
the
time.
So
if
you,
if
you,
if
you
want
to
be
even
more
certain
that
your
estimation
is
correct,
you
can
just
pick
a
few
random
numbers
and
just
go
with
the
majority
majority
vote,
so
you
can
do
this
with
a
balanced
distribution.
A
You
cannot
do
this
with
the
current
distribution
in
the
DHT,
because
here,
if
you
pick
a
random
number,
a
random
peer
from
the
network,
the
chances
that
you
end
up
with
a
highly
unrepresentative
depth
are
nearly
a
hundred
percent.
Basically,
so
this
is
one
example
of
why
you
might
want
to
have
balanced
distribution,
see
you.