►
From YouTube: SDR Overlap Sets and Subsampling (Episode 3)
Description
In this episode of HTM School, we talk about SDR overlap sets and subsampling.
Properties of Sparse Distributed Representations and their Application to Hierarchical Temporal Memory: http://arxiv.org/abs/1503.07469
SDR Visualizations: https://github.com/nupic-community/sdr-viz
Intro music: "Books" by Minden: https://minden.bandcamp.com/track/books-2
A
I
am
Matt
Taylor
from
Numenta
and
welcome
to
another
episode
of
HDM
school
today,
we're
going
to
talk
about
sparse,
distributed
representation,
overlap,
sets
and
subsampling,
but
first
a
little
aside,
because
I
want
to
talk
about
the
neuron
and
how
neurons
are
related
to
STRs,
how
they
process
STRs.
So,
let's
look
at
STRs
from
a
neuron
standpoint.
Here
is
a
pyramidal
neuron
in
the
neocortex.
A
This
neuron
is
a
complex
pattern,
recognition
system
and
it's
receiving
a
constant
stream
of
STRs
it's
getting
STRs
from
its
apical
dendrites,
which
are
receiving
feedback
from
higher
levels
in
the
hierarchy.
Those
represent
connections
to
cells
up
at
a
higher
level,
it's
receiving
STRs
from
basal
dendrites
that
are
like
contextual,
sdrs,
coming
from
cells
in
the
same
region
of
the
hierarchy,
and
it's
also
receiving
str
input
from
proximal
dendrites.
These
are
coming
from
lower
levels
of
the
hierarchy,
their
direct
connections
to
cells
there
or
from
sensory
input.
A
So,
as
we
talk
about
today's
lesson-
and
we
look
at
all
the
STRs
we're
going
to
look
at
today,
keep
that
in
mind
that
every
bit
in
these
represents
some
type
of
new
neural
activity
and
that's
how
your
brain
processes
information.
So,
let's
take
a
look
at
overlap.
Sets
if
you
think
about
an
overlap
set
for
a
specific
SDR
like
this,
given
any
SDR,
let's
call
the
SDR
X,
given
any
SDR
that
has
an
end
and
a
W.
A
A
This
w
is
coming
from
this
slider,
and
this
for
T
bits
of
overlap
is
coming
from
this
V,
which
is
how
many
bits
do
we
require
for
the
overlap.
In
this
case
you
can
see
there
is
only
one
for
this
specific
SDR
X.
How
many
different
ways
can
these
bits
fit
into
these
specific
spots
and
there's
only
one
way.
A
So
we
have
this
formula
here
which
gives
us
the
length
of
the
overlap
set
the
size,
so
let's
dig
into
this
formula
a
little
bit
and
try
and
make
it
intuitive.
So
what
we
have
here
is
we've
split
this
formula
up
into
products.
This
is
the
product
of
these
two
terms.
The
left
term
is
in
this
box
on
the
left
and
we
are
multiplying
it
by
the
right
term,
which
is
in
this
box
on
the
right,
and
the
way
to
think
about
this
is
the
left.
Side
is
the
on
bit
space.
A
The
right
side
is
the
off
bit
space
for
this
SDR,
and
in
this
case
we
have
right
now,
without
touching
any
of
these
sliders
for
T
on
bits
can
fit
in
only
one
way
in
for
T
spaces.
That's
what
this
is
telling
us
for
T
on
bits
for
T
spaces.
One
way,
that's
basically
WX
choose
B
and
then
remember
in
a
previous
episode
when
we
talked
about
the
capacity
of
an
SDR,
this
is
a
capacity
formula,
there's
only
one
way
that
these
can
fit.
A
Similarly,
on
the
right
hand,
side
we're
talking
about
all
the
off
bits
you
might
notice.
560
is
600
forty
right,
so
these
are
just
representing
all
of
the
different
off
bits
that
are
in
this
SDR
and
how
many
different
ways
zero
on
bits
can
fit
there.
Well
only
one
right
so
one
times
one
equals
one.
A
Now,
we're
gonna
make
this
a
little
more
interesting
I'm
going
to
require
not
that
every
single
bit
matches,
but
I'm
gonna
move
this
down,
one
to
thirty
nine.
So
now
I'm
saying:
what's
the
overlap
set
for
this
STR,
where
I'm
requiring
thirty
nine
bits
of
overlap,
not
forty,
so
it's
no
longer
one,
it's
twenty
two
thousand
four
hundred-
and
this
is
because
one
of
these
bits
now
is
gone
so
in
the
on
bit
space.
There's
one
bit
missing,
there's
forty
different
ways
to
arrange
that
missing
bit
because
there's
forty
different
spaces.
A
Similarly,
on
the
other
side,
we've
moved
that
bit,
we've
taken
it
out,
we're
saying
we
don't
require
that
bit
to
be
on
anymore,
but
it
has
to
be
somewhere.
It
has
to
be
in
this
empty
bit
space
somewhere
in
this
five
hundred
and
sixty
bit
space.
There
are
five
hundred
different
places.
It
can
be
so
the
overlap
set
cardinality
is
the
product
of
how
many
ways
the
on
bits
can
fit
me
on
space
times
how
many
ways
those
extra
bits
can
fit
in
the
off
bit
space.
A
So
I
hope
that
makes
this
a
little
bit
more
intuitive
and
it's
interesting
to
see
as
we
slide
this
B
number
down
and
require
less
and
less
bits
for
a
match.
You
can
see
all
these
bits
in
the
on
space
not
being
required
anymore,
jumping
over
into
the
off
bit
space
and
there's
a
lot
more
places.
Those
can
fit
over
here.
A
So
you're
going
to
see
this
number,
the
right-hand
product
increase
very
rapidly
much
more
rapidly
than
the
left-hand
product
as
we
move
this
B
down
right
and
the
overall
overlap
set
cardinality,
which
is
here,
goes
up
very
quickly
as
we
move
it
up.
We
require
more
bits,
there's
less
and
less
sdrs
that
will
fit
in
that
space
as
we
move
it
down.
There's
more
and
more,
the
chance
of
a
false
positive
can
also
be
calculated.
A
It
is
basically
the
cardinality
of
the
overlap
set
divided
by
the
uniqueness
of
the
original
STR,
so
this
is
pretty
low
in
this
case,
and
as
we
move
B
up,
it's
going
to
get
lower
and
lower
and
as
the
overlap
set,
gets
lower
and
lower
too.
So
the
chance
of
a
false
positive
will
go
up
as
the
amount
of
STRs
in
the
overlap
set
goes
up
because
there's
more
chance
of
some
random
collision.
A
So
quick
jump
to
a
bigger
example,
2048
bits.
If
we
were
to
require
a
perfect
match
where
B
is
40
and
W
is
40
again,
there's
only
one
STR
that
will
fit
in
the
space.
Let's
dial
this
down
to
something
more
realistic,
maybe
3/4
of
the
total
on
bit.
So
if
we
require
3/4
of
the
total
on
bits
to
match
and
the
overlap
score,
then
we
have
a
chance
of
false
positive
here.
That
is
extremely
low.
A
So
this
shows
that
the
resilience
of
the
STRs
here
as
we
increase
n
in
an
SDR,
the
capacity
of
the
SDR,
goes
up
so
quickly
that
the
space
is
so
large.
There
is
an
almost
astronomically
small
chance
of
any
false
positive
occurring,
so
we
can
be
pretty
certain
that
if
we
get
a
match
and
if
we're
comparing
an
SDR
to
something
in
an
overlap
set-
and
we
get
a
match
that
it's
a
it's
a
real
match-
it's
not
just
some
random
match.
A
So
now,
let's
talk
a
bit
about
subsampling
in
a
previous
episode,
we
talked
about
how
we
could
compress
STRs
because
they
are
sparse.
They
can
be
compressed
simply
by
saving
the
indices
of
the
on
bits.
We
don't
have
to
serialize
the
entire
array
of
ones
and
zeros.
We
can
just
save
those
integer
values
representing
the
indices,
which
compresses
is
quite
a
bit.
It
turns
out
that
STRs
are
so
fault,
tolerant
that
you
don't
even
have
to
save
all
of
those
indices.
You
can
just
take
a
sample
of
them.
A
You
could
take
a
percentage
of
those
indices
and
save
them,
and
when
you
want
to
match
incoming
STRs
to
that
sub
sample,
it's
still
really
accurate.
So,
let's
take
an
example
of
this
here
we
have
an
original
STR
X.
It
has
an
N
of
1024
W
of
8,
very
low
sparsity
and
I'm,
calling
the
sub
sampled
version
of
this
X.
Prime
ok.
So
these
are
the
this
is
sort
of
the
original
SDR.
The
sub
sampled
version
only
has
four
bits
on.
A
As
you
can
see,
some
of
these
bits
have
been
randomly
removed
as
a
part
of
the
sampling
process.
So
all
of
these
bits
and
X
prime
exists
in
X,
but
half
of
them
don't
half
of
them
are
removed.
So
now
we
take
some
random
SDR
Y
that
has
the
same
dimensionality
as
the
original
SDR
and
we
want
to
match
it
and
see.
If
we
can
tell
is
this
the
original
SDR
or
not,
based
on
a
sub
sample
of
that
original
SDR.
A
A
Tell
us
what's
the
overlap,
score
and
show
both
of
the
bits
in
one
place
and
as
you
can
see
as
I'm
cycling
through
this
there's,
the
overlap
scores
been
0,
there's
one,
so
it's
very,
very
low
chance
of
there
being
a
collision
in
this
case,
and
if
you
do
the
math
the
chance
of
a
false
positive
here
with
these
parameters.
Right
now
is
about
three
and
10,000,
so
I
could
sit
here
and
click
this
button
all
day
long
and
only
get
an
overlap
score
that
justifies
a
match.
In
this
case,
our
theta
is
two
right.
A
So
out
of
these
8
bits,
two
of
them
must
overlap
with
these
bits
in
the
sub
sample
right
out
of
those
random
eight
bits,
and
it's
barely
ever
gonna
happen.
If
we
move
this
up
and
require
four
bits,
the
chance
of
false
positive,
it
goes
way
down.
If
we
require
just
one
bit
or
one
bit,
you
know
it's
three
and
a
hundred
or
so
so
you
could
probably
see
this
happen
if
I
continued
to
click
this.
So
let's
go
to
an
even
larger
example.
A
So
here
we
have
an
SDR
over
two
thousand
bits,
forty
of
which
are
ons,
and
this
is
a
typical
HTM
dimensionality
for
STRs,
where
subsampling
again,
half
of
them,
so
we're
gonna
we're
going
to
store
20
bits
in
this
X
Prime,
and
now
we
can
see
our
chance
of
false
positive
is
gone
even
further
down,
as
you
can
see
here,
so
there's
really
little
chance
as
I.
Try
a
bunch
of
random
Y
SDRs
here
that
there's
going
to
be
an
overlap
of
over
ten
and
in
this.
A
So
this
shows
you
sort
of
the
resilience
to
false.
If
somehow,
we,
if
we
don't
store
all
of
those
on
bits
when
we're
when
we're
storing
STRs
for
comparison,
if
you
don't
have
to
that's
still
very
resilient
to
noise,
so
hopefully
you
come
away
from
this
episode.
Understanding
that
the
way
we
use
SDR
is
with
HTM
systems.
They
are
extremely
tolerant
to
noise
because
of
the
massive
capacity
of
STRs
we
can.
A
So
in
the
next
episode,
we're
going
to
talk
about
SDR
classification
and
unions
and
that's
going
to
go
even
deeper,
so
I
hope
you
enjoyed
this
episode.
Please
click
subscribe
if
you
haven't
subscribed
to
our
Channel
and
you
like
this
series
and
click
that
thumbs
up
button
down.
There
encourages
me
to
continue
producing
these
videos.
Thank
you
for
watching.