►
Description
"Sparse Distributed Representations: Our Brain's Data Structure"
Subutai Ahmad, VP Research, Numenta
Numenta Workshop Oct 2014 Redwood City CA
A
Okay,
so
I'm
sabotaging
I'm
going
to
start
talking
a
little
bit
about
sparse,
distributed
representations,
but
now
before
we
get
started,
I
do
want
to
warn
you
that
this
talk
is
quite
different
from
jeff's
talk.
Jeff
gave
a
very
broad
overview
of
the
theory,
I'm
going
to
go
fairly
deep
into
one
particular
area
and
just
fair
warning
that
there
is
going
to
be
a
little
some
parts
of
math
in
there
a
little
bit
of
math
in
there.
A
Now.
You
don't
really
need
to
understand
the
math
to
understand
the
gist
of
it,
but
we
felt
it's
important
to
go
through
some
of
the
math,
because
we're
trying
to
come
up
with
some
rigorous
ways
of
understanding
the
the
behavior
of
of
cortical
systems,
so
just
fair
warning.
There
is
going
to
be
a
little
bit
of
math
there.
I
know
for
some
of
you,
that's
not
really
your
thing,
and
so,
if
you
want
to
take
this
opportunity
and
go
outside
and
make
math
jokes
then
we're
I
will
not
be
upset
so,
okay,
how
many?
A
Okay,
all
right,
so
our
goal
at
numenta
is
to
try
to
understand
the
computational
principles
of
the
cortex
and
then
to
build
intelligent
systems
based
on
those
principles
and
it's
useful,
sometimes
when
we
do
that
to
look
at
things
from
a
computer
science
perspective.
A
So
what
I'm
going
to
try
to
do
is
share
with
you
some
of
the
progress
we've
made
in
understanding
one
particular
part
of
it,
which
is
how
information
is
represented
through
sparse,
distributed
representations
and
how
it's
used
throughout
the
system.
Okay,
so
before
I
really
get
into
the
details,
I
want
to
discuss
a
little
bit
about
the
role
of
sparse
distributed
representations.
Where
are
they
used?
How
are
they
used?
A
Well,
it
turns
out
they're
pretty
much
used
everywhere
in
cortex
and
just
to
give
you
a
flavor
flavor
for
that,
I'm
going
to
walk
you
through
one
particular
one
simple
example.
So,
let's
say
you're
playing
a
musical
instrument.
Okay,
as
you're
playing
your
auditory
system,
is
listening
to
the
musical
notes
and
in
your
auditory,
cortex.
A
small
percentage
of
the
neurons
at
any
point
in
time
are
responding
to
the
particular
frequencies
that
are
being
played.
They're
highly
tuned
to
specific
frequencies,
the
rest
of
the
neurons
at
any
point
in
time
are
silent.
A
A
Similarly,
in
the
visual
system,
there
are
neurons
that
are
responding
to
specific
spatial
frequency
patterns
and
colors,
and
so
on
and
there's
a
small
percentage
of
the
neurons
in
a
visual
system
are
active
at
any
point
in
time
and
there's
a
sparse
pattern
of
activity.
That's
representing
the
visual
scene,
and
indeed
you
know
ev,
all
all
of
your
sensor
areas
are
representing
the
sensor
information,
that's
coming
at
any
point
in
time
as
multiple
sparse
distributed
representations.
A
A
Those
areas
are
influenced
by
the
top
down
sdrs
that
are
coming
from
higher
levels
and
and
each
neuron
is
deciding
whether
or
not
it
fits
into
the
plan
at
this
point
in
time
and
executing
a
series
of
instructions,
and
so
you
have
a
sequence
of
sdrs,
that's
generating.
Let's
say
your
finger
positions
on
a
violin
or
head
movements
or
whatever
it
might
be,
and
similarly
your
brain
is
making
predictions
about
what
sounds
it
might
hear.
Those
are
represented
as
sdrs.
A
Your
attention
system
is
using
sdrs
to
decide
what
areas
to
pay
attention
to
and
what
what
to
ignore
and
really
sdrs
are
the
foundation
for
all
cognitive
functions
across
all
sensory
mechanics
modalities.
So
it
really
is
another
we
think
of
it
as
the
brain's
common
data
structure,
and
so
what
would
like
to
do
is
to
try
to
understand
this
data
structure
analyze
it,
and
maybe
this
can
help
provide
a
more
rigorous
foundation
for
understanding
cortical
computing.
If
we
can
really
analyze
it,
we
can
understand
the
behavior
of
the
system.
A
A
Okay,
so
here's
kind
of
an
outline
really
just
two
parts
of
the
talk:
I'm
going
to
spend
a
little
bit
of
time
on
kind
of
the
basics
of
sdrs
and
sparse
disability
representations,
and
then
I'm
going
to
go
through
a
list
of
fundamental
properties
that
we
can
derive
based
on
what
we
know
from
the
neuroscience
and
from
math,
and
I'm
going
to
talk
about
things
like
error,
bounds
and
scaling
laws
and
so
on.
Okay,
so
let
me
introduce
you
to
an
sdr
here.
It
is
so
this
video
loop
was
created
in
professor
hassan's
lab.
A
What
it
shows
is.
It
shows
basically
a
patch
of
cortex
in
mouse
cortex
and
each
individual
light.
There
is
a
single
neuron,
that's
firing
at
a
particular
point
in
time,
and
what's
really
amazing
about
this,
video
is
that
this
was
recorded
while
the
mouse
was
performing
complex,
cognitive
tests.
So
this
is
not
from
an
anesthetized
animal.
It's
from
an
awake
animal
performing
complex
task.
So
your
brain
right
now
as
you're
listening
to
the
speech,
if
you
were
to
poke
inside
it
is
looks
like
this.
So
these
flashing
lights.
A
Okay,
so
what
let
me
I'm
going
to
go
through
and
list
some
basic
attributes
of
sdrs
what
we
know
from
the
neuroscience
okay,
so
the
most
basic
thing
is
at
any
point
in
time,
there's
just
a
small
number
of
neurons
that
are
firing.
Okay,
there's
just
a
few
of
these,
these
bright
lights
at
the
same
time,
there's
a
lot
of
black
space
here,
there's
actually
a
very
large
number
of
neurons
that
could
potentially
be
firing
at
any
point
in
time.
Okay,
but
only
a
small
percentage
of
firing.
A
Every
cell
here
represents
something
and
has
some
semantic
meaning.
We
know
that
neurons
in
cortex
tend
to
be
fairly
highly
tuned
to
specific
patterns,
and
so
every
point
in
there
actually
has
some
meaning
it's
not
some
random
bit.
That's
coming
on,
and
at
the
same
time
no
no
cell
is
critical.
You
can
destroy
a
good
percentage
of
these
cells
in
the
system
will
work
just
fine
and
in
fact
the
information
is
distributed
across
the
cells
you,
you
might
have
a
cell.
A
That's
focused
on
a
particular
orientation
of
an
edge
and
you'll
have
other
cells
that
overlap
with
it
that
respond,
maybe
not
as
tightly
to
that
particular
orientation,
but
to
some
other
orientation.
So
the
information
in
cells
is
distributed
and
no
single
cell
is
critical.
So
that's
a
very
important
property
of
sdrs.
A
A
We
also
know
from
neuroscience
that
whatever
the
data
structure
is,
it
has
to
enable
extremely
fast
computation.
The
cortex
can
recognize
amazingly
complex
objects
in
a
very
small
number
of
steps,
sometimes
as
small
as
20
or
25
steps.
It
can
recognize
faces
and-
and
you
know,
animals
and
and
so
on,
so
that
is
it
whatever.
The
data
structure
is
there's
not
that
much
room
for
a
lot
of
iteration
and
so
on.
A
It
has
to
work
very
fast,
so
the
data
structure
has
to
enable
very
efficient
computation
and
the
last
property
is
that
sdrs
are
by
and
large
binary.
Now
there
are
some
cases
where
you
see,
maybe
perhaps
some
non-binary
activation,
but
what
we
found
is
there's
so
much
room
in
the
sdrs.
We
can
pretty
much
represent
anything
that
we
need
to
do
with
a
binary
code
and
for
the
rest
of
this
talk,
I'm
going
to
assume
that
everything
is
binary.
A
Also
in
the
rest
of
the
talk
I'm
going
to
represent
sdrs,
not
as
these
black
squares,
but
as
a
as
a
vector
as
a
binary
vector
and
in
this
vector
the
positions
there
represent
individual
cells
and
if
the,
if
a
number
is
zero,
that
means
the
cell
is
not
active,
and
if
the
number
is
one
that
means
the
the
cell
is
active.
Okay.
So
we're
going
to
look
at
binary
vectors
that
that
look
like
this.
A
Okay,
let's
look
at
a
single
neuron.
How
does
a
neuron
operate
on
sdrs?
Well,
each
every
cortical
neuron
gets
a
number
of
different
sdrs
as
input
and
jeff
went
through
some
of
this.
In
his
talk,
we
have
sdrs
that
are
coming
from
from
above
we
have
feedback
sdrs.
We
have
context
sdrs,
whether
it's
temporal
context
or
other
context,
and
we
have
bottom
up
sensory
sdrs.
A
So
we
have
a
each
neuron
is
getting
a
bunch
of
these
sdrs
coming
into
it
and
at
the
end
of
the
day,
each
neuron
then
represents
one
bit
in
some
output
sdr
that
the
rest
of
the
system
is
going
to
see
okay,
so
it
gets
a
bunch
of
different
input
sdrs
and
it's
going
to
output
one
bit
in
this
in
this
vector.
Let's
look
at
this
in
a
little
more
detail
on
the
left.
A
You
have
the
the
pyramidal
cell
neuron
and
on
the
right
we
have
our
model
neuron
again
jeff
went
into
this
in
some
detail.
I'm
going
to
look
at
just
two
aspects
of
this.
First
we
have
the
distal
dendritic
dendrites.
They
are
getting
the
feedback
sdr
and
the
context
sdr
into
it
and
in
our
model,
neuron
they're
represented
with
those
blue
synapses
there,
the
up
and-
and
we
tend
to
have
you
know
hundreds
you
know
100
to
200
of
these
distal
dendritic
segments.
A
Now
in
the
brain,
each
of
these
segments
are
actually
fairly
independent
of
each
other,
and
each
segment
is
detecting
a
particular
unique
sdr
using
a
threshold
operation.
So
if
you
have,
those
blue
dots
represent
individual
synapses,
individual
connections
to
those
sdrs.
If
enough
of
them
on
are
on,
then
that
segment
will
say:
hey
I've
detected
this
sdr,
but
each
of
these
segments
are
operating
independently
with
a
very
simple
kind
of
threshold
computation.
A
The
second
thing
I
want
to
focus
on
is
the
proximal
dendrites
and
those
typically
get
bottom-up
sensory
sdrs
and
those
are
represented
with
the
green
green
dots
there
and
they
also
represent
multiple
patterns,
but
in
a
very
different
way.
The
proximal
segments
represent
dozens
of
separate
patterns
in
a
single
segment.
So
here
we
have
a
bunch
of
sdrs
that
are
kind
of
smooshed
together
into
one
segment
and
somehow
it's
able
to
recognize
each
one
independently,
okay,
and
so
these
two
basic
types
of
operations
I'll
go
into
in
more
depth.
A
Later
and
again,
you
know,
in
both
cases
each
synapse
here
corresponds
to
one
bit
in
some
incoming
high
dimensional
sdr.
That
is
the
input
to
the
neuron,
and
then
the
neuron
is
going
to
output,
one
bit
in
some
output
sdr
okay.
So
what
are
some
of
the
properties
that
we
want
to
go
over?
A
A
A
Let's
discuss
a
little
bit
of
notation
here,
so
I'm
going
to
represent
an
sdr
vector
as
a
vector
with
n
binary
values
so
where
each
bit
represents
the
activity
of
a
single
neuron.
Okay,
so
I've
shown
an
example
there
x,
so
you
have
n
different
bits
and
they're
either
going
to
be
0
or
1
depending
on
whether
a
cell
is
firing
or
not,
s
is
going
to
be
the
percentage
of
on
bits
and
I'm
going
to
use
the
letter
w
to
denote
the
actual
number
of
on
bits
in
the
representation.
Okay.
A
So
if
you
have
a
vector
w
is
simply
the
the
cardinality
of
that
the
number
of
on
bits
in
there.
Okay
and
here's
an
example.
I
have
two
different
sdr
vectors
in
this
case.
N
is
40.,
so
there
are
40
total
elements
in
each
one
where
there's
10
percent
sparsity.
So
you
have
about
four
bits
on
it
at
any
point
in
time.
So
that's
a
pretty
small
sdr
vector
typically
in
our
implementation,
we
use
much
larger
numbers
and
these
correspond
much
more
closely
to
the
numbers
you
see
in
in
a
layer
in
biology.
A
So
we
typically
use
n,
the
value
of
n
is
somewhere.
You
know
between
2000
and
65
000
somewhere
in
that
range,
so
these
are
pretty
high
dimensional
vectors
sparsity
that
we
work
with
tend
to
be
anywhere
from
about
.05
percent,
when
n
is
really
high,
all
the
way
to
about
two
percent-
maybe
four
percent
somewhere
in
that
range
and
the
value
of
w
that
we
typically
use
are,
is
around
40..
A
Okay,
so
that's,
and
it
turns
out
that
there's
a
reason
for
these
numbers,
which
comes
out
of
the
math
here,
okay,
so
first,
let's
talk
about
capacity.
This
is
fairly
straightforward,
so
the
number
of
unique
patterns
that
can
be
represented
in
a
vector
simply
n
choose
w.
There
are
w
on
bits
out
of
a
possible
n
bits.
Now.
This
is
a
lot
smaller
than
2
to
the
n,
which
is
what
you
would
get
if
you
had
a
dense
representation,
but
this
is
far
more
than
any
reasonable
need.
A
You
might
have
so,
for
example,
in
the
the
range
that
we're
dealing
with
of
let's
say,
n
of
2048
and
w
of
40.
The
number
of
unique
patterns
is
actually
10
to
the
84th
or
greater
than
that
which
is
way
way
way
greater
than
the
number
of
atoms
in
the
universe.
So
you
know
it's
worth
pointing
this
out,
because
people
have
been
concerned,
you
know
if
you
have
a
sparse
representation,
maybe
you're
losing
something
but
there's
actually
tremendous
room
in
there
to
represent
really
rich
concepts.
A
Okay.
Similarly,
if
you
took
two
random
vectors,
the
chance
that
they're,
actually
identical,
is
basically
zero.
Okay,
it's
one
over
that
number.
So
it's
extremely
unlikely
that
if
you
were
to
pick
two
random
sdr
vectors
that
they're
going
to
be
the
same
okay,
so
that's
capacity,
I'm
going
to
talk
a
little
bit
in
detail
about
how
we
can
recognize
how
well
we
can
recognize
patterns
in
the
in
the
presence
of
noise
and
I'm
going
to
need
to
develop
a
few
concepts
along
the
way.
A
So,
first
of
all
we're
going
to
talk
about
similarity
metrics.
So
if
you
talk
about
recognizing
patterns,
you
want
to
know
when
two
patterns
are
similar
to
one
another
and
if
they're
similar
enough,
then
you
say
they're
that
you
recognize
it
and
with
sdrs
we
don't
use
typical
vector
similarities,
so
neurons
cannot
compute
euclidean,
distance
or
hamming
distance,
or
anything
like
that.
That
actually
requires
full
connectivity
between
layers,
and
we
just
don't
see
that
the
similarity
metric,
we're
going
to
use
is
called
the
overlap
metric.
A
So
the
overlap
is
simply
looking
at
the
number
of
bits
you
have
in
common.
You
can
think
about
this
as
sort
of
the
opposite
of
hamming
distance
hamming
distance.
How
many
bits
are
different
here?
We're
only
concerned
about
the
shared
bits
here,
and
this
requires
very
minimal
connectivity.
If
you
have
a
vector
with
40
on
bits,
you
only
need
to
look
for
those
40
bits
in
any
other
target.
Sdr.
Okay,
you
don't
need
you
don't
care
about
the
rest
of
the
bits,
so
it
can
be
very
efficient
and
mathematically.
A
You
just
take
the
end
of
two
vectors
and
then
compute
the
length
that
gives
you
the
the
overlap.
Okay,
we
can
also
define
a
match.
So
we
say
we
detect
a
match
between
two
vectors
if
they're
close
enough.
So
basically,
if
their
overlap
between
two
vectors
is
within
some
threshold,
then
we
say
they
match.
Okay,.
A
A
Okay,
so
how
accurately
can
we
match
with
noise
to
kind
of
build
this
up?
Consider
this
diagram
here?
This
circle
shows
the
space
of
all
possible
sdr
vectors
and
we
want
to
match
some
other
candidate
vector
against
a
specific
set
of
n
stored
vectors
here.
So
each
of
these
dots
is
a
particular
vector
that
we
want
to
match
against
and
we
would
like
to
match
match
them
in
the
presence
of
noise.
So,
in
the
case
of
sdrs,
each
bit
has
semantic
meaning.
A
We
know
that
we
never
get
the
same
input
twice,
but
as
long
as
the
input
is
similar,
the
bits
are
going
to
be
shared
and
there's
going
to
be
a
high
overlap
between
them.
So
what
we
care
about
is
how
well
can
you?
How
well
do
two
sdrs
overlap
there?
So
the
way
you
can
do
that
is,
you
basically
decrease
the
threshold
theta
and,
as
you
decrease
the
match
threshold,
you
can
see
that
the
white
space
around
each
vector
increases.
This
is
the
set
of
vectors
that
match
given
this
threshold.
A
So
as
you
decrease
the
threshold,
you
become
more
and
more
robust
to
noise
you're
going
to
allow
more
and
more
patterns
matching
against
that
candidate
pattern.
Of
course
you
don't
get
anything
for
free,
as
you
do
that
you
also
increase
the
chance
of
false
positives
if,
as
the
white
area
grows,
there's
a
much
higher
chance,
this
is
going
to
overlap
with
some
other
vector
that
is
not
coming
from
the
source
that
you're
interested
in
okay.
So
we're
interested
in
this
trade-off.
A
What
is
the
size
of
the
white
space
versus
the
size
of
the
the
gray
space?
Okay?
So
it
turns
out.
You
can
actually
calculate
this
and
we
can
do
this
using
something
called
the
overlap
set,
so
how
many
vectors
match
as
you
decrease
the
threshold
okay,
so
we
define
the
overlap
set
of
x
to
be
the
set
of
vectors
with
exactly
b
bits
of
overlap
with
x
and
in
order
to
do
a
match.
Let's
say
you
have
w
of
40
and
your
threshold
is
30.
A
A
It
turns
out.
This
is
the
equation
for
that
and
basically
there
are
two
components
to
that.
On
the
left-hand
side,
you
have
the
number
of
subsets
of
x
with
exactly
b
bits
on.
So
let's
say
you
have
40
bits
that
are
on
and
you're
looking
to
see
how
many
vectors
are
there
that
have
33
bits
that
so
that's
going
to
be
40,
choose
33
and
then
in
this.
A
The
second
component
of
this
is
how
many
patterns
you
have,
with
w
minus
b
bits
in
the
rest
of
the
representation,
okay,
so
for
each
each
vector,
with
exactly
b
bits
on
you're,
going
to
have
a
number
of
other
patterns
that
have
w
minus
b
bits
on
okay
and
the
product
of
that
gives
you
the
total
number
of
vectors
that
have
exactly
b
bits
of
overlap
with
x
and
then
the
error
bound
is
simply
going
to
be
now
the
ratio
of
the
white
space
to
the
to
the
gray
space.
A
So
you
look
at
all
the.
If
you
look
at
b,
all
the
way
from
theta
to
to
w
you
add
them
all
up.
That
gives
you
a
sing,
the
white
space
and
you
divide
by
the
gray
space
which
is
n,
choose
w,
and
so,
if
you
have
a
single
stored
pattern
and
you
pick
a
random
other
pattern,
the
probability
of
getting
a
false
positive
is
given
by
by
this
equation.
A
Okay,
now,
if
you
have
m
stored
patterns,
you
can
get
a
pretty
tight
upper
bound
just
by
adding
all
of
this
up.
So
if
you
look
at
the
union
of
all
of
the
white
space
in
the
diagram
that
gives
you
the
total
amount
of
you
know
in
all
the
other
vectors
that
gives
you
the
the
total
number
of
possible
false
positives
and
then
divide
again
by
the
the
and
choose
w,
and
that
gives
you
the
the
probability
of
a
false
positive.
A
This
equation
is
very
hard
to
get
an
intuitive
understanding,
for
there
are
factorials
and
exponentials
in
here,
and
it's
really
hard
to
get
an
intuition
behind
it.
So
what
does
this
actually
mean
in
practice?
So
we
can
plug
in
a
bunch
of
numbers,
but
essentially
it
turns
out
that
with
sdrs
you
can
classify
a
huge
number
of
patterns
with
substantial
noise
in
them
as
long
as
n
and
w
are
large
enough.
Okay.
A
So,
for
example,
if
you
have
n
of
2048
and
w40
turns
out,
you
can
have
up
to
33
noise
up
to
14
bits
of
noise,
and
you
can
actually
classify
a
quadrillion
patterns
with
an
error
rate
of
less
than
10
to
the
minus
24..
This
is
basically
insane
right,
so
yeah
there's
a
thousand
trillion
patterns
with
an
extremely
high
amount
of
noise,
and
you
can
classify
them
with
very,
very
high
with
very
low
probability
of
any
sort
of
error,
and
this
is
sort
of
the
beauty
of
sdrs.
A
This
would
be
very
difficult
to
do
with
a
dense
representation
and
essentially
what's
happening
with
an
sdr.
Is
that
it's
changing
the
representation
in
such
a
way
that
there
is
a
there's,
a
tremendous
amount
of
room
there
and
you
can
create
a
very
simple
recognition
system
and
recognize
a
very
large
number
of
patterns
very,
very
robustly.
A
You
can
actually
get
do
even
better.
You
can
get
up
to
50
noise
and
the
error
rate
is
still
extremely
good
in
this
case.
It's
one
in
about
a
hundred
billion,
so
you
can
classify
again
a
very
large
number
of
patterns
with
a
lot
of
noise
with
a
very
low
error
rate.
Now
it
turns
out
again
from
the
from
the
math.
This
only
works
if
n
and
w
are
both
large
enough.
A
So
as
an
example,
if
you
take
n
of
64
and
w
of
12,
you
can
take
the
same
percentage
of
noise,
which
is
about
four
bits
in
this
case.
In
this
case,
you
can
only
classify
about
10
patterns
and
the
error
rate
is
0.04
you're
in
a
very
dramatically
different
regime
when
you're
with,
when
you
have
these
small
numbers,
so
you
really
need
the
numbers
to
be
large
enough
to
get
into
this
really
nice
regime,
where
you
can
classify
things
extremely
robustly?
A
A
Okay,
we
can
also
learn
a
little
bit
about
neurons
from
here,
and
it
turns
out
that
neurons
are
actually
extremely
robust
pattern.
Recognizers
we've
talked
about
the
distal
dendritic
segments.
They
essentially
do
this
match
operation,
so
they're
looking
at
an
overlap,
so
each
segment
is
looking
at
an
overlap
with
the
input
sdr.
If
the
overlap
is
above
some
threshold,
the
segment
says
it's.
It's
detected
that
pattern,
and
so
we
we
can
understand
that
neurons
are
extremely
robust
pattern.
Recognition
systems
from
the
math.
A
It's
also
interesting
that
the
numbers
in
the
math
actually
correspond
really
nicely
to
the
numbers
we
see
in
neuroscience,
so
the
math
it
says
you
know
you
need
to
have
w
has
to
be
about.
You
know
a
dozen
two
dozen
before
you
get
into
this
nice
range
and
it
turns
out
that
distal
dendritic
segments
you
see
about
a
dozen
a
few
dozen
synapses
in
each
segment,
and
it's
it's
really
nice
that
there's
a
very
nice
correspondence
there
between
the
theory
and
what
we
see
in
biology
and
I
think,
there's
a
good
rationale
for
that.
A
Again,
you
need
a
high
enough
w
in
order
to
get
high
accuracy.
You
can
also
have
now
tens
of
thousands
of
these
neurons
arranged
in
a
network
looking
at
the
same
input
sdr
such
as
the
feedback,
sdr
or
context
sdr,
whatever
it
might
be,
and
picking
up
very
robustly
picking
up
very
subtle
patterns
in
there
again.
The
capacity
for
doing
this
is
huge
and
you
have
extremely
robust
recognition
as
long
as
you're
in
the
right
regime
of
numbers.
A
Okay,
so
sdrs
give
us
the
capability
to
recognize
a
very
large
number
of
patterns
with
very
high
noise.
Let's
talk
a
little
bit
about
random
deletions
as
well
turns
out,
there's
actually
very
similar
to
the
previous
case.
So
sdrs
are
very
robust
to
random
deletions
and
we
know
in
cortex
that
bits
in
an
sdr
can
just
disappear,
so
individual
synapses
can
are
very
unreliable.
A
A
This
actually
a
great
property
for
those
building.
Hdm
hardware-
and
the
nice
thing
about
this
analysis-
is
that
we
can
actually
characterize
the
exact
probability
of
the
failures
given
the
the
system,
design
parameters,
okay,
so
the
for
those
building
hdm
hardware
systems-
this
is
a
really
useful
property.
A
So
there
are
a
bunch
of
situations
where
we
want
to
store
multiple
patterns
within
a
single
sdr
and
then
later
match
them
against
a
candidate
sdr,
and
you
know,
jeff
talked
about
one
example
of
this,
which
is
looking
at
the
proximal
dendrite.
There's
another
example
which
is
in
in
temporal
inference.
The
system
has
to
make
multiple
predictions
about
what's
going
on
in
the
future,
but
those
predictions
are
represented
as
the
predictive
state
in
in
cells
and
there's.
A
We
just
have
a
fixed
set
of
cells,
and
so
somehow
you
have
to
be
able
to
represent
multiple
predictions
at
any
point
in
time
about
the
future.
In
this
fixed,
fixed
vector
and
these
predictions,
you
know
the
set
of
candidate
predictions
changes
at
every
time,
step.
Okay,
so
it
has
to
be
a
very
dynamic
thing.
The
brain
doesn't
have
the
capability
of
allocating
memory,
all
the
time
like
we
can
in
software,
it
has
a
fixed
structure
and
you
have
to
be
able
to
represent
this
very
dynamic
property
within
this
fixed
structure.
A
So
it
turns
out
you
can
do
that
with
sdrs
up
to
some
limit,
so
we
can
store
a
set
of
patterns
in
a
single
fixed
representation
just
by
taking
the
or
of
all
of
these
individual
patterns.
Okay,
so
here's
an
example:
it's
the
same
example
that
jeff
walked
through
earlier
suppose.
You
have
ten
different
vectors,
each
with
two
percent
sparsity.
A
Okay,
so
of
course
the
the
more
and
more
vectors
you
ore
together,
the
more
on
bits,
you're
going
to
have
and
there's
going
to
be,
this
trade-off
again
and
the
vector
representing
the
union
is
also
going
to
match
a
large
number
of
other
patterns
that
were
not
one
of
the
original
10
or
not
one
of
the
original
set.
And
so
where
is
this
trade-off?
Where
does
this
break
down
and
how
many
such
patterns
can
we
store
reliably
without
a
high
chance
of
false
positives?
A
This
calculation
is
exactly
the
same
as
with
bloom
filters,
if
you're,
if
you're
familiar
with
them,
so
you
can
calculate
that
so
now
you
have
an
expected
number
of
on
bits
in
the
union
in
in
the
union
vector,
and
you
can
then
plug
that
into
the
previous
equation
that
we
had
and
we
you
can
actually
calculate
exactly
the
chance
of
a
false
positive
without
going
into
the
details
again.
What
does
this
mean
in
practice?
So
it
turns
out.
A
A
If
you
have
n
equals
2048
and
a
w
40,
it
turns
out
that
you
can
actually
take
the
union
of
50
patterns
and
have
about
a
one
in
a
billion
chance
of
false
positives.
This
was
this
is
not
intuitive.
Each
pattern
here
has
two
percent
sparsity,
but
you
can
actually
or
together
50
of
them
and
intuitively
you
might
think,
oh
well.
A
There's
40
percent
are
off,
and
so
in
order
for
another
random
vector
to
match
that
all
of
the
on
bits
in
that
other
vector
have
to
be
within
that
60
and
the
chance
of
that
is
actually
very
low,
and
so
that's
how
we
get
that's
the
intuition
behind
why
this
error
is
as
low
as
it
is.
Okay,
but
again
you
need
a
large
enough
n
and
a
large
enough
w
to
do
this.
If
you
have
n
of
512
and
a
w
of
10,
you
get
a
much
much
higher
error
rate.
A
A
Okay,
so
those
are
the
the
main
properties
I
want
to
talk
about.
The
last
thing
I
want
to
touch
on
is
the
fact
that
the
efficiency
of
the
system-
and
it
turns
out
that
sdrs
enable
extremely
efficient
operations.
A
A
If
you
assume
that
each
neuron
can
fire
at
most,
you
know
once
every
five
milliseconds
then
there's
something
like
20
to
30
computational
steps
in
order
to
recognize
a
face
or
a
person
or
a
you
know,
an
animal
or
anything
like
that.
So
it
you
know,
the
number
of
operations
has
to
be
extremely
fast.
There's
no
there's
no
real
time
for
loops
or
optimization
steps,
and
so
on.
A
So,
even
though
sdr
vectors
are
large,
all
of
the
operations
I've
talked
about
are
actually
ofw,
it's
so
of
the
number
of,
depending
on
the
number
of
on
bits,
not
on
the
underlying
size
of
the
vector.
This
would
not
be
true
if
you
were
using
euclidean
distance
or
hamming
distance.
In
those
cases
you
need
to
look
at
the
entire
vector,
but
with
the
overlap
operation,
you
just
need
to
look
at
a
small
number
of
on
bits
and
that
can
be
done
extremely
efficiently.
A
Similarly,
matching
a
pattern
against
a
dynamic
list
like
the
union
is
also
ofw
only
care
about
the
number
of
on
bits,
and
in
this
case
it's
not
the
number
of
on
bits
in
the
union
vector
it's,
not
the
sixty
percent
of
bits
that
are
on
you
just
care
about
the
number
of
on
bits
in
the
vector
that
you're
testing.
So
if
your
vector
has
40
on
bits
and
your
union
vector,
has
you
know
a
thousand
on
bits,
it
doesn't
matter.
A
It's
all
you
need
to
do
is
check
those
40
bits,
so
you
can
do
this
again
extremely
fast,
and
this
is,
I
think,
key.
This
is
really
what
enables
those
tiny
dendritic
segments
to
become
really
robust
pattern
recognition
systems,
there's
not
that
much
room
for
computation
in
those
segments,
and
so
because
of
the
nature
of
the
overlap,
operation
and
the
nature
of
sdrs.
You
can
actually
get
extremely
robust
recognition
with
very,
very
little
computation
and
we
exploit
this
in
our
software
as
well.
A
All
of
our
software
is
written
so
that
it's
all
dependent
on
the
it's
really
depend
on
the
number
of
on
bits
and
on
my
laptop,
you
can
easily
simulate
something
like
200,
000
neurons
at
25
to
50
hertz
on
you
know
an
eight
core
laptop,
so
you
can.
Thanks
to
these
operations,
you
can
actually
they're
amenable
to
extremely
fast
operate,
computation.
A
Okay,
so
that's
the
the
main
stuff
I
wanted
to
talk
about
just
in
in
summary,
sdrs
are
the
common
data
structure
in
the
cortex
sdrs
enable
very
flexible
recognition
systems
that
have
very
high
capacity
and
are
robust
a
very
large
amount
of
noise.
A
The
union
property
allows
a
fixed,
represent
representation
to
encode
a
dynamically
changing
set
of
patterns
in
a
very
again
in
a
very
robust
way,
and
although
we're
just
getting
started
in
this
kind
of
analysis,
the
hope
is
that
this,
this
kind
of
analysis
will
provide
a
principle
foundation
for
characterizing
the
behavior
of
htm
learning
systems
and
then
perhaps
all
cognitive
function
as
well.
We
think
that
this
is
kind
of
the
basis.
A
A
And
lastly,
I
didn't
really
mention
other
work
in
in
the
talk
per
se,
but
over
the
last
15
to
20
years,
there's
been
a
fair
amount
of
work
in
understanding,
sparse
codes
and
sparse
representations,
and
I'm
just
picking
out
sort
of
three
bodies
of
work
here
that
have
been
influential
to
us
and
canerva's
work
on
sparse
memory,
bruno's
work
on
sparse
coding
and
then
the
bloom,
the
math
behind
bloom
filters
are
actually
very
relevant
to
the
stuff
we're
doing.
Okay,
thank
you.
C
Hi,
I
have
a
question:
what
kind
of
data
you
can
represent
in
this
data
structure
now
and
does
it
have
any
data
type
is
really
hard
to
represent
in
this
data
structure?.
A
A
You
know
sdrs
are
extremely
good
at
representing
you
know,
sensory
patterns
and
the
types
of
patterns
that
underlie
you
know
cognitive
operations.
However,
it
would
not
be
my
first
choice
to
represent
some
of
the
stuff
we
typically
represent
in
computer
software,
so
I
wouldn't
use
it
to
represent
a
database.
You
know
you
wouldn't
want
it
to
be
the
primary
representation
for
unicode
or
you
know,
sql
databases
or
or
docking.
You
know
some
documents.
A
So
if
you
want
very
precise
representations
where
you
can't
tolerate
noise,
you
can
do
it
with
sdrs,
but
they're
not
going
to
be
as
efficient.
You
know
something
like
ascii
or
normal.
Dense
representations
are
going
to
be
much
better
for
that,
but
by
and
large
anything
any
sort
of
information
or
data
that
you
know
we
typically
use
in
cognitive
processing
can
be
represented
very
well
with
sdrs.
D
Yeah,
I
have
a
question
which
made
me
a
bit
confused.
So
actually
you
said
that
we
have
a
structure
for
connectivity
right,
so
we
don't
connect
every
neuron
to
everybody
else,
right,
that's
right,
yeah!
So
how
this
is
represented.
How
is
it
learned
I
mean,
and
and
also
all
the
math
you
showed-
I
mean
it's
assuming
random
ones
right.
So
if
you
have
structure
in
connectivity
that
doesn't
hold
anymore
right,
I
mean
you,
don't
okay,.
A
A
The
way
this
happens
is
that
you
know
each
segment
connects
to
a
particular
set
of
neurons,
but
there
are
a
bunch
of
other
connections
that
are
kind
of
nearby
and
jeff
talked
about
the
growth
of
synapses
and
it
turns
out-
and
you
know,
if
you
know
if
a
an
axon
is
firing
and
it's
correlated
with
a
cell-
that's
firing,
that's
nearby,
you
will
eventually
grow
a
synapse
and
you
will
so
there's
a
bunch
of
potential
connections
that
are
there
are
okay,
but
it
is
not
anywhere
near
full
connectivity
and.
A
Yeah
I
sort
of
ignored
spatial
layout
and
topography
and
so
on,
and
we
can.
We
can
talk
about
that
too,
but
the
same
principles
will
apply
in
those
situations.
You
know
I
mentioned
that
sdrs
can
work
with
random
deletions.
What
I
didn't
talk
about
is
that
you
can
actually
subsample
the
input
and
you
can
get
very
robust
recognition
as
well.
It's
really
the
same
thing,
and
so
that's
another
way
that
you
can
avoid
having
a
full
connectivity.
A
Okay,
I
think
the
second
question
was
you
mentioned
that
I
assumed
all
random
randomness
and
no
structure
as
well
right.
So
this
this
analysis
did
assume
that
everything
is
is
uniformly
random
and
there
are
sort
of
two
ways.
Two
answers
to
that.
One
is
in
learning
theory.
The
way
that
is
handled
is
by
using
something
like
a
pack-
formalism,
they're,
probably
probabilistic,
probably
approximately
correct
formalism.
So
there
you
incorporate
the
the
distribution
of
the
patterns
into
your
analysis
and
I
think
the
results
will
come
out
very
similar.
A
If
you
do
that,
the
other
side
of
the
coin
is
actually
one
way
to
think
about.
It
is
maybe
it's
the
job
of
the
learning
rule
to
come
up
with
representations
that
actually
randomize
the
data
and
make
things
more
independent.
A
E
How
do
you
model
the
time
here?
I
guess
it's
that
union
of
vectors
yeah,
so
basically
each
vector
will
have
will
represent,
will
be
a
sensory
vector
at
a
certain
time,
and
if
this
is
true,
I
guess
it
will
be
a
kind
of
window
of
time
which
will
be
considered
through
that
union
of
a
number
of
vectors.
A
A
This
was
a
fairly
abstract
analysis
dealing
with
sdrs,
but
this
analysis
applies
very
well
with
to
the
temporal
memory
structures
that
we
that
we
use-
and
in
that
case
jeff
alluded
to
that
a
little
bit
we
have
you
have
cells
in
a
column
and
the
sparse
set
of
activity
in
the
cell
represents
the
temporal
context,
and
so
you
have
when
you
have
patterns
that
follow
that
temporal
context,
they're
represented
by
other
cells
and
and
the
dendritic
segments
in
those
cells
are
recognizing
the
previous
temporal
context.
A
So
what
this
analysis
tells
you
is
the
bounds
of
well,
how
many
temporal
contexts
can
you
recognize
and
how
much
noise
can
you
have
in
those
in
there
before
this?
Your
sequence
mechanism
starts
to
break
down,
so
the
analysis
gives
you
a
lot
of
insight
into
that.
There.
F
I
I
think
jeff
has
a.
I
just
have
a
comment
on
that.
I
I
might
have
heard
the
question
differently
I
might
have
not,
but
I
think
the
question
was
asking
you
thinking
that
the
these
union
was
of
different
sdrs
at
different
points
in
time,
but
they're
not
that's
a
it's
like
if
we
do
a
union
of
predictions.
It's
all.
These
are
all
simultaneous
predictions.
F
It's
we're
not
using
the
union
property
to
represent
time,
it's
a
spatial
union
property.
So,
for
example,
as
you're
listening
to
my
speech,
your
brain
is
making
many
many
simultaneous
predictions
about
what
I
might
say
next,
the
attributes
that
I
might
say
next
and
so
when
you
you'll
know,
if
I
see
something,
that's
unexpected,
but
you
can't
make
a
specific
prediction,
but
anyway,
that
union
is
a
is
a
point
in
time.
It's
not
we're
not
using
the
union
to
represent
time
itself,
that
is
in
the
transition
states.
In
the
memory
system.
I
talked
about.
E
B
A
Yeah
and
in
in
newpick
it's
a
fairly
straightforward
thing,
so
we
have
a
process
for
taking.
You
know
arbitrary
data
and
encoding
it
into
an
sdr,
and
then
we
feed
it
into
the
hdm
system.
At
the
end
of
the
day.
What
we
do
is
we
take.
We
have
something
we
call
a
classifier
that
maintains
a
mapping
of
sdrs
to
actual
values
and
we
go
through
the
classifier
there
to
recover
the
original,
the
actual,
let's
say
the
predicted
value
whatever.
A
So
it's
it's
sort
of
handled
outside
of
of
the
cortical
theory
as
well.
So
we
have
a
process
for
encoding
things
into
sdr,
and
then
we
have
a
process
for
decoding
them
back
in
different
contexts
back
to
the
original
value,
but
internal
to
the
htm,
no
matter
how
many
regions
or
layers
you
have,
it's
all
the
languages,
all
sdrs,
the
encoding
and
decoding
is
just
for
interfacing
with
the
rest
of
the
world.
A
Yeah
in
general,
it's
not
a
one-to-one
mapping,
so
the
part
of
what
the
classifier
does
is
is
give
you
back
the
most
likely
value
that
so
there's
there's
a
document
on
exactly
how
we
do
that
anything
else.
G
Oh,
so
you
have
these
big
vectors
and
so
on
and
they
generate
others.
So
presumably
things
aren't
recorded
in
the
brain
at
any
one
time
it
takes
many
repetitions,
so
you
have
to
kind
of
model
the
fact
that
things
are
remembered
after
many
repetitions
of
the
same
vector
is
that
the
proper
view,
because
there's
there's
short-term
memory,
long-term
memory
in
the
brain
and.
A
Yeah,
so
this
is
exactly
this
is
sort
of
the
job
of
the
learning
algorithm,
and
so
when
I
showed
you
those
synapses,
it
takes
multiple
repetitions
before
those
synapses
are
connected.
So
that's
one
way
that
learning
happens
and
we
also
have
this
dynamic
state,
which
is
the
the
predictive
state
of
the
system.
And
so
you
know
you
can
represent
information
dynamically
and
you
can
carry
information
from
you
know
through
a
sequence,
you
know
without
it
becoming
permanent
as
well.
A
F
Is
it
on
there
we
go
I'll,
just
add
a
little
bit
to
that
too
in
in
biology.
What
we
have
in
the
hdm
theory
is.
We
have
a
learning
rate
which
is
sort
of
a
rate.
That's
applied
to
a
sort
of
hebbian
process
to
the
growth
of
the
synapses,
and
so
we
can
check
that
rate
to
make
it
so
you
take
three
three
iterations
or
four
iterations
or
whatever
you
want
to
become
a
synapse,
that's
useful.
We
can
also
make
it
learn
very
rapidly.
We
can
say
we
can
learn
in
one
presentation.
F
We
just
make
the
threshold
we
make
sure
that
the
synapse
gets
over
its
threshold
in
one
step,
there's
some
equivalent
to
this
in
the
brain.
There
are
different
modulators
in
brains
which
make
you
learn
quickly
or
not.
Learn
so
that's
well
understood,
for
example,
dopamine
is
something
that
makes
you
lay
down,
memories
much
faster,
so
the
real
learning
algorithms
in
biology
are
much
more
complex
with
a
infusion
of
various
different
neuromodulators.
We
have
a
fairly
simple
version
of
it,
but
it
could
be
more
complex
if
you
wanted
to
so.