►
From YouTube: Invertible Bloom Filters - @matheus23 - Unconf
Description
No description was provided for this meeting.
If this is YOUR meeting, an easy way to fix this is to add a description to your video, wherever mtngs.io found it (probably YouTube).
A
All
right,
I
want
to
talk
about
invisible
bloom
filters,
which
I
thought
was
just
a
super
interesting
kind
of
data
structure
when
I
was
looking
for
a
bunch
of
things
on
like
how
to
do
sync
better
and
just
I
just
came
across
like
a
paper.
A
I
liked-
and
I
wanted
to
present
about
it,
because
I
think
it's
very
interesting-
it's
not
to
be
confused
by
the
way,
with
inverted
bloom
filters
which,
because
hannah
has
asked
about
it,
you
were
saying
there
was
like
there's,
probably
also
a
boom
filter
variant
that
doesn't
have
false
positives,
but
rather
false
negatives
or
the
other
way
around
whatever
and
inverted
bloom
filters
are
exactly
what
that
does.
But
this
is
not
what
I'm
talking
about,
or
that's
not
what
I'm
talking
about.
I'm
talking
about
inversible
bloom
for
this.
B
A
All
right,
so
the
problem
I
was
trying
to
solve
was
we
would
have
two
peers
and
I
would
have
like
a
dag
in
both
of
them
and
one
of
them
would
add
some
stuff,
and
I
want
to
sync
that
without
having
to
send
over
the
whole
dag
again
so
I'm
going
to
have
some
deduplication
there,
and
so
ideally,
I
just
send
over
those
two
new
nodes
and
then
both
of
them
have
the
same
data
and
everything's
fine.
A
A
Sometimes
we
have
these
super
long
chains
of
stuff
and
bitswap
just
takes
a
lot
of
time
to
do
all
of
those
run
trips,
and
so
you
end
up
doing
or
having
like
very
high
latency
and
you,
like
their
peers,
end
up
talking
to
each
other
a
bunch
until
they
finally
have
synced
all
of
those
trees.
But
the
thing
is
the
dag
structure
actually,
for
us
doesn't
matter
that
much.
A
And
so
naturally,
I
went
looking
for
papers
and
eventually
found
this,
which
is
what's
the
difference.
Efficient,
set
reconciliation
without
prior
context
and
like
in
disguise.
It's
just
invertible
bloom
physics,
and
so
that's
what
I'm
talking
about-
and
I
like
the
title
of
the
paper
without
prior
context-
means
like
other
kinds
of
protocols
needs
like
some
kind
of
context,
for
both
peers
to
have
in
advance.
A
But
this
paper
really
just
assumes
just
like
bitswap
that
there's
no
prior
context.
Peers
are
just
asking
hey.
I
have
this
stuff,
what
is
the
difference
and
like
it's
a
like
two
round
trip
kind
of
protocol
to
get
all
of
the
differences
between
those
peers
and
the
like?
The
amount
of
stuff
you
have
to
send
over
is
big
o
of
the
difference
of
the
sets
and
not
big
o
of
the
sets
themselves.
A
So
it's
very
little
round
trips
and
potentially
very
little
stuff
to
transfer,
which
I
think
is
pretty
cool
right.
So
how
would
you
do
set
reconciliation?
So
here
I
have
like
two
peers
kind
of
or
two
sets
of
hashes.
You
could
say
and
there's
like
one
set,
that
has
two
blue
hashes,
which
indicate
like
some
additional
stuff,
some
additional
hashes
and
I
can
take
the
set
difference
and
I'll
get
back
that
hash.
The
problem
is
those
two
sets
of
hashes.
A
Don't
live
on
the
same
machine
so
to
do
this
kind
of
difference
operation
I
would
have
to
synchronize
the
sets
since
potentially
lots
of
hashes
over
the
wire,
and
I
don't
want
to
do
that.
So
what
I
can
do
instead
is,
I
can
use
an
encoding
kind
of
function
to
encode.
These
sets
as
invertible
bloom
filters,
and
these
bloom
filters
have
hand
wavy
constant
size,
so
their
size
depends
on
some
other
parameter.
That
is
not
the
size
of
the
sets
here.
A
And
what
I
can
do
with
those
invertible
bloom
filters
is,
I
can
just
subtract
them.
That's
the
magic
of
these
kinds
of
bloom
filters
and
you
get
a
new
bloom
filter
and
that
bloom
filter
is
actually
the
encoding
of
that
different
set
and
there
exists
a
decoding
function
that
takes
the
bloom
filter
and
creates
the
like
gives
you
the
difference
set.
A
Of
course
there's
like
some
caveats.
Oh
yeah,
sorry,
I
I
can
like
go
the
top
route,
which
means
like
potentially
saying
over
a
bunch
of
stuff
or
it
can
go
the
bottom
route.
The
caveat
is
this
decoding
function,
it's
being
a
bloom
filter,
has
some
success
rate
and
some
failure
rate.
It
sometimes
fails
you
just
can't
reconstruct
the
whole
set
but
like,
and
that
is
tunable
by
the
parameter
and
that's
the
size
of
the
bloom
filter.
A
So
you,
but
the
interesting
thing
is
the
failure
rate
depends
on
the
size
of
the
set
you
want
to
reconstruct
not
of
the
size
of
the
sets
you
used
in
the
very
beginning
here
at
the
top,
and
so
the
bloom
filter
size
depends
on
the
size
of
the
difference
of
your
sets,
which
is
the
whole
magical
thing
right,
yeah
and
so
like
you
can
tune
the
size
to
get
like
different
success.
A
A
I
just
want
to
preface
that,
but
it
is
nonetheless
very
interesting
and
very
useful
in
some
cases
I
think
so,
just
like
in
brook
slides.
We
have
a
bloom
filter
or
an
in
virtual
bloom
filter,
which
is
somewhat
like
a
counting
bloom
filter,
so
you
have
a
bunch
of
cells
and
at
the
very
bottom,
you
see
the
count
here
like
the
last
row
of
elements
that
were
hashed
into
the
cell.
The
middle
row
is
some
kind
of
small
byte
thing
like
in
my
implementation.
I
just
used
a
64
bit
number.
A
A
I
find
some
cells
in
the
bloom
filter
that
corresponds
to
the
hash,
and
I
increase
the
count
and
I,
like
the
top
cells,
are
just
32
byte
cells,
so
they're
they're
the
whole
hash
they're,
where
you're
getting
from
like
the
element
in
the
decoding
function
eventually,
and
then
the
middle
row
is
just
some
kind
of
check,
something
you
just
do
a
different
hash
function
on
your
hash.
I
know
I'm
going
to
say
hash
a
lot
in
this
talk.
A
I
think
and
yeah
you
saw
that
kind
of
thing
in
there
and
when
you
have
something
else
that
you're
hashing
in
there,
you
just
increase
the
counts
again
and
when
you,
when
you
like,
hit
a
conflict,
what
you
do
is
you
xor
your
hash
and
you
x,
or
your
checksum
hash,
all
right,
yeah,
so
top
row
is
the
is
in
the
paper
is
called
the
id
middle
row
in
the
paper
is
called
the
hash
and
bottom
row
is
called
the
count.
A
I
find
that
a
little
bit
confusing
so
when
you're
reading
the
paper
keep
in
mind
that
id
is
actually
a
hash
and
hash
is
a
hash
of
a
hash
yeah
all
right
and
then
there
is
a
decode
function.
So
you
can
like
take
a
inversion,
balloon
filter
that
you
encode
it
and
you
can
decode
it
the
way
you
do
this.
Is
you
look
for
cells
that
have
a
one
or
a
minus
one?
A
We'll
maybe
touch
on
that
and
you
just
look
at
the
very
top
kind
of
cell,
and
you
know
that
only
one
item
was
added
to
it,
so
it
was
exported
with
a
zero
string,
and
so
you
know
that
there's
just
the
hash
and
converted
out
if
it
matches
the
checksum
and
when
you
have
this
element,
you
now
know
like
where
what
other
cells
it
was
encoded
into,
and
there
may
be
like
another
cell.
That
does
not
have
a
count
of
one
yet,
but
you
can
now
export
or
subtract
your
like.
A
I
don't
know
hash
from
that
cell
and
you
get
back
a
bloom
filter
that
may
uncover
more
ones,
and
you
iteratively
do
this
until
you
have
an
invertible
bloom
for
that
that's
empty
and
you
have
like
a
set
of
hashes.
You
recovered
from
it
and,
of
course,
this
breaks
down
very
quickly.
This
decode
function,
if
there's
like
a
lot
of
items
in
the
filter
and
the
magic
of
the
filter,
is
that
its
size
doesn't
yeah.
Its
size
does
not
depend
on
like
the
amount
of
items
you
have
in
there.
A
So
let's
say
this
is
one
machine
and
has
this
kind
of
invertible
bloom
filter,
there's
thousands
of
items
in
there.
You
can't
recover
anything,
but
if
you
have
another
machine
and
it
has
a
very,
very
similar
kind
of
set,
it
has
lots
of
items
in
there
as
well
just
a
tiny
difference
in
between
them.
Once
you
subtract
them
and
the
subtracting
is
using
xor
and
it's
like
doing
minus
on
the
couch
accounts,
you'll
get
back
a
boom
fill.
A
That
is
maybe
just
a
single
element
and
you
can
decode
that
right,
that's
basically
it
there's
some
fun
stuff.
You
can
do
with
it.
I
mean
this
is
like,
for
example,
in
our
use
cases
we're
like.
Maybe
we
can't
quite
use
this
because
you
may
have
like
a
phone
and
you
may
have
a
laptop
and
your
phone
doesn't
actually
store
the
whole
dac
or
your
whole
file
system
or
whatever
it
is.
It
may
not
do
that,
and
so
in
those
cases
right,
you
don't
want
to
actually
sync
those
stacks.
A
You
kind
of
only
want
to
sync,
I
mean
some
parts
of
it.
You
there
needs
to
like
what
I'm
trying
to
illustrate
here
is
some
fun
application
or
some
fun
thinking
around
invertible
boom
filters
and
using
their
algebraic
properties.
To
like
do
some
interesting
use
case.
So
what
I
want
to
have
is,
I
have
a
phone
I
work
in
it
like.
I
have
some
dag
here,
but
my
phone
doesn't
have
the
whole
dac.
It
just
has
a
bunch
of
things
that
fetched
and
the
rest
of
it
is
it
doesn't
care
about.
A
That
represents
the
whole
set
that
the
phone
has,
and
I
can
take
that
to
like
compute
the
difference
between
them
and
like
these
kinds
of
algebraic
properties.
I
think
there's,
maybe
some
more
stuff
to
uncover
here
some
more
kind
of
interesting
use
case.
You
could
do
with
it.
I
don't
know
yeah
very,
very
application
space.
I
think
there
are
some
caveats,
so
investor
bloom
filters
are
like
the
paper
and
some
slides
that
the
author
has
say
it's
like
only
really
efficient.
A
If
you
have
diffs
that
are
like
max
at
maximum
like
10
to
15
of
the
total
set
size.
So,
for
example,
if
you
have
a
new
node
coming
into
the
system
and
like
it
has
zero
hashes,
and
you
have
a
note
that
has
a
whole
dag
that
it
wants
of
course,
encoding
that
set
of
hashes
as
an
invertible
bloom
filter
that
you
can
decode
is
going
to
take
a
lot
of
space
and
that's
just
not
efficient.
So
like
the
efficiency
is
like
up
until
10
to
15
of
the
set
size
you
can
use
them.
A
That
is
the
problem,
but
you
need
to
choose
the
size
of
your
bloom
filter
in
advance,
which
sounds
like
just
a
bummer
and
like
it
kind
of
breaks
the
whole
protocol,
but
it
doesn't.
Actually,
I
promise
there's
like
a
strata
estimator,
that's
described
in
the
paper.
I
just
don't
have
enough
space
in
this
talk
to
talk
about
that,
but
yeah.
It
kind
of
estimates.
How
big
the
difference
in
in
like
the
set
size
is
so
you
can
still
do
the
whole
thing
in
two
round
trips.
A
You
just
need
to
prepare
a
bunch
of
bloom
filters
with
different
sizes,
which
is
like,
I
know,
kind
of
a
thing
with
bloomfield
just
in
general.
All
right-
and
that
is
also
a
problem
right
bloomfield
is
constructing
them,
takes
time
yeah
and
that's
pretty
much
all
I
have
oh
yeah
right,
there's
a
chance.
You
can't
decode
people
yeah,
that's
pretty
much!
All
I
have
there's
a
paper
link
and
I
wrote
an
implementation
rust
as
a
toy,
so
I'll
post,
those
slides
as
a
pdf
and
yeah.
D
A
It
it
goes
uh-oh,
I
don't
understand.
So
specifically,
you
have
like
the
count
at
the
bottom
and
you
you
look
for
so-called
pure
cells,
which
is
count
one
or
minus
one,
and
it
may
just
happen
that
all
of
the
cells
in
there
have
count
two
or
above
and
you
don't
know
what
to
subtract
subtract
anymore
to
like
get
another
pure
song,
and
so
it
kind
of
just
throws
its
hands
in
the
air
and
says
I
can't
decode
sorry
sure.
C
A
I
mean
it
sounds
super
stupid.
I
think
I
I
just
googled
different
kind
of
keywords
found
a
stack
of
a
flow
answer
from
the
author
yeah.
I
think
at
google's
thinking
dax
and
someone
asked
this
on
stackoverflow
and
the
author
of
this
paper
actually
answered,
and
so
I
kind
of
that
was
my
intro
to
it
and
then
I
look
for
other
papers
stuff
like
that
references.
I.
E
Will
say
that,
generally,
the
vision
team
has
a
large
repository
of
papers
in
our
discord
and
in
our
discourse
forum,
and
so
that
might
actually
be
warrant
specifically
for
ipfs.
Maybe
we
literally
set
up
a
papers
category
in
the
ipfs
discourse
that
we
can
tag
and
collectively
find
useful
things
together.
Research
is
actually
really
another
great
example
that
does
this
research,
the
ethereum
research
discourse
forum
also.
F
A
H
B
A
A
Bytes
plus
eight
bytes,
plus
eight
bytes,
so
you
have
like
48
bytes
per
bucket
right
then.
A
Yeah
it
is
it
it
pays
off
because
of
like
the
whole
thing
with.
If
your
set
difference
compared
to
your
set
is
like
small,
then
it
pays
off
like
over
constructing
bloom
filters
that
encode
the
set
instead
of
the
set
difference.
That's
why
I'm
like
yeah,
there's,
there's
caveats.
A
Yeah,
absolutely,
I
think,
in
terms
of
like
use
cases,
especially,
I
can
think
of
this
being
very
useful
for
more,
like
ipfs
operators
that
have
very
very
big
pin
sets,
but
have
like
over
time
they
change
very
little,
but
then
they
have
like,
let's
say
some
cluster
or
something,
and
they
want
to
like
talk
in
this
cluster
figure
out
what
what
they
need
to
send
each
other.
But
that
said,
maybe
it's
not
too
important
to
have
like
very,
very
low
round
trips
in
in,
like
in
between
cluster
notes.
G
A
H
G
D
G
Maybe
a
better
example
is
like
I
want
to
download.
I
have
a
newspaper
reviews
and
I
know
I
downloaded
one
downloaded
printed
state
of
communities.
It's
a
different,
beautiful
point,
you're
having
nothing
to
do
with
mine
right.
B
G
And,
and
so
like
that,
doesn't
quite
fit
this
model
right
because
we
have
no,
there
isn't.
H
E
This
other
versions
of
yourself
as
the
first
case
or
the
other
case
where
you've
got
if
we
do
a
larger
in
you
know,
james
and
philip,
have
shared,
have
a
shared
folder
and
they
each
have
any
devices
who
are
on
and
offline,
and
you
know
that
there's
it's
going
to
sink
somewhere
in
there
because
they're
operating
over
the
same
set.
E
F
You
think
about
this
is
like
a
bunch
of
individual
actors
who
have
like
varying
issues
and
stay
the
less
useful.
This
is,
but
I
would
argue
that
this
is
like
not
the
most
common
case
like
like
when
you
were
talking
about
the
mobile
devices
cloud
like.
Maybe
you
don't
want
the
whole
database
in
the
device
absolutely,
but
if
you're
thinking
of
the
whole
database
of
just
much
patch
blocks,
then
any
query
is
also
just
a
bunch
of
patch
blocks.
Yes,
and
what
you
do
want
on.
F
A
A
Yeah,
and
maybe
like
one
more
thing,
I
wanted
to
address
something
you
said
with
the
private
prior
context.
It's
true.
You
need
some
kind
of
prior
context.
You
need
to
know
what
set
are
both
peers
like
talking
about
right,
but
what
you
don't
need
to
know
is
you
don't
like,
maybe
in
contrast
to
other
kind
of
protocols,
for
example,
if
you're
regularly
communicating
between
your
phone,
your
laptop,
you
may
just
remember
what
your
laptop
has
or
have
the
last
time,
and
so
you
can
it's.