►
From YouTube: CLA Deep Dive (2013 Fall NuPIC Hackathon)
Description
CLA Deep Dive (2013 Fall NuPIC Hackathon)
A
Okay,
guys
we're
going
to
start
in
a
few
minutes
after
I
get
subutai
miked
up
thanks
for
joining
in.
A
A
A
A
A
A
A
A
A
Okay,
guys,
let
me
introduce
stupid
ahmed.
A
Subitize,
the
vp.
B
Okay,
cool!
This
is
quite
a
technology
get
up
here
all
right!
Thank
you.
So
matt
asked
me
to
do
a
deep
dive
into
some.
A
Aspects
of
the
cla
algorithm
and
it's
a
bit
of
an
experiment.
I
have
no
idea
how
much
of
this
will
be
interesting
or
not
to
you
and
but
my
goal
here
will
be
to
try
to
give
you
a
little
bit
deeper
understanding
into
some
of
the
foundational
stuff
behind
the
cla
algorithm.
A
I'm
not
really
going
to
touch
the
code
too
much
it's
more
about
the
algorithm
principles
around
the
cla
and
what
I've
done
is
really
just
prepared,
a
short
15-minute
kind
of
piece
of
it
and
then
I'll
open
it
up
to
questions,
because
I
know
a
bunch
of
you
have
different
questions
around
this
healing
and
already
people
have
been
asking
me
all
sorts
of
things.
So
we'll
we'll
go
from
there.
A
C
Happens
at
that
level
as
well
within
the
cla.
What
I
thought
I'd
do
is,
basically
you
know,
there's
no
way
we're
going
to
cover
it
all,
there's
no
way.
So
what
I'm
going
to
do
in
the
beginning
is
focus
mostly
on
the
on
this.
What
happens.
A
In
a
single
level-
and
it
touches
on
some
of
the
hierarchical
aspects
as
well
and
then
I'm
going
to
open
it
up
for
questions-
and
I
imagine
we'll
do
a
lot
of
stuff
on
the
on
the
white
paper,
because
I'm
really
not
sure
exactly
what
aspects
you
guys
are
all
interested
in.
E
Cla
quiz
that
we
give
to
new
employees
who
want
to
work
on
the
algorithm,
so
these
go
really
really
detailed,
and
these
are
hard.
These
are
not
easy.
So,
for
example,
you
know
if
you
have
a
temple
pooler
that
has
only
learned
these
two
sequences,
a
b
c
d,
e
and
f
g
c
d.
H
now
suppose
you
present
the
sequence,
f
g
c
d:
what
is
the
temporal
pool
of
predicting
at
that
point
in
time?
What
is
the
exact
state?
What
is
the
exact
representation
of
the
temple
cooler?
E
E
A
It's
a
property
of
sparse
distributed
representations
and
it's
a
property
that
I
think
it's
really
key
to
understanding
exactly
how
temple
predictions
work
in
the
temple
cooler.
I
think
it's
key
to
understand
to
getting
a
hierarchy
to
work,
and
I
know
a
bunch
of
you
are
interested
in
that
aspect
of
it.
So
I
think
understanding
this
piece
is
going
to
be
really.
A
A
So
I
think
this
property
will
help
with
that,
and
I
think
it'll
also
help
you
understand
some
of
the
numbers
that
we
have
in
the
cla
and
in
newpick.
You
know:
why
do
we
have
40
out
of
2048?
Why
does
the
activation
threshold
set
a
certain
way,
or
why
do
we
have
you
know
so
many?
What?
Why
is
the
you
know,
the
percent,
the
percent
of
connections
that
are
initializes
in
the
spatial
cooler
initially?
Why
is
it
that
way?
A
A
A
I
should
say
this
is
a
totally
optional
session,
if
you're
not
interested
in
in
getting
into
this
level
of
detail,
feel
free
to
hack
or
continue
on
your
hack.
I
won't
be
upset
at
all
so
we'll
I'll
talk
about
that.
One
thing
in
in
some
a
little
bit
of
depth,
maybe
10-15
minutes
and
then
I'll,
just
open
it
up
for
questions
and
whiteboard
sessions
and
happy
to
address.
A
You
want
okay,
so
quick
background.
I
think
most.
A
Distributed
representation,
so
this
is
from
one
of
jeff's
talks.
A
So,
for
example,
you
know
with
two
thousand
bits
you
might
have:
two
percent
of
them
acted
and
in
new
pic,
many
of
our
setups
are
set
up
so
that
you
have
2048
bits,
and
you
know,
40
of
them
are
active.
That's
what
comes
out
of
the
spatial
cooler
today
and
then
each
bit
has
some
sort
of
semantic
meaning
to
it.
A
This
is
very
different
from
dense
representations
like
ascii,
where
you
have
a
small
number
of
bits.
Typically
and
all-
and
you
have
all
combinations
of
ones
and
zeros
are-
are
possible
typically
they're,
very
dense
and
eight
bit.
Ascii
is
an
example
and,
and
each
bit
typically
doesn't
have
any
you
know
a
huge
amount
of
meaning,
but
so
sparse
represent
distributed
representation,
that
is
the
language
of
the
cla.
A
Given
two
sdrs,
you
can
compare
them
because
each
bit
has
some
semantic
meaning
you
can
look
at
how
many
bits
are
shared
across
two
different
sdr
representations
and
you
just
count
them
and
the
more
bit
that
are
shared
the
more
similar.
They
are
you.
You
can
store
sdrs
efficiently,
because
these
are
sparse.
A
You
just
need
to
store
the
indices
of
the
active
bits
and
because
that's
a
very
small
number,
the
actual
number
of
bytes
you
need
to
store
an
sdr
vector,
is
only
dependent
on
the
number
of
on
bits.
It's
not
depend
on
the
size
of
the
whole
vector,
and
we
definitely
rely
on
that
quite
a
bit
in
both
algorithmically
and
from
a
code
optimization
standpoint
in
a
few
places.
A
Another
property
of
sdrs
in
I
didn't
talk
about
the
distributed
nature
of
information,
but
the
sdr's
each
bit
has
semantic
meaning
and
the
information
is
actually
distributed
across
the
bits
that
are
on
and
no
one
bit
is
critical
to
to
the
to
the
meaning
of
the
thing.
So
what
you
can
do
is
you
can
sub
sample.
You
don't
actually
have
to
store
every
single
on
bit.
A
You
can
actually
sub
sample
and
you
can
get
the
gist
of
what
that
representation
is,
and
often
you
can
get
a
very
accurate
representation
of
the
underlying
vector.
Even
though
you're
not
storing
every
single
bit.
A
How
would
you
know
which
bits
of
the
of
the
possible
of
the
ones
that
are
turned
on
to
subsample
yeah,
so
you
in
most
of
in
the
algorithms
that
we
have?
You
actually
don't
need
to
know
you
just
randomly
subsample
and
it's
an
interesting
property
that
as
long
as
you
subsample
enough
of
them,
you're
going
to
get
the
gist
of
the
meaning
in
there?
A
A
A
Do
those
have
any
closer
significance
because
of
their
proximity
than
256
and
310,
or
is
it
only
between
subsequent
patterns
that
the
that
the
indexes
make
make
any
significance
yeah?
So
it
could
be
either
or
so
in.
That's
a
good
question
in
it
sort
of
depends
on
the
structure
of
the
problem
in
a
problem
like
vision,
there's
a
natural
topology
and
for
mostly
for
efficiency.
A
We,
you
lay
out
your
columns
in
a
topological
manner,
and
so
neighboring
columns
will
be
more
similar
or
be
looking
at
the
same
part
of
the
input
space
than
something
that's
far
away.
Now
in
new
pic
in
the
opf,
we
use
a
version.
D
A
A
So
it's
a
code,
efficiency
purpose
or
is
it
a
neuroscience
reason
to
subsample
from
40
to
10
active
bits
to
store
those,
so
it
at
least
it
definitely
helps
with
efficiency.
So
in
the
tempo
cooler
you
know
you
can
use
the
number
of
bits
you
have
to
store
because
you
can
subsample.
A
There
is
a
neuroscience
reason,
although
I
don't
think
it's
really
that
important
for
us.
We
really
just
efficiency,
but
in
the
neuroscience
you
can
only
form
a
synapse
which
is
what's
going
on
here,
a
connection
between
two
cells
and
you
can
only
form
them
if
the
cells,
the
the
axon
and
the
dendrite
of
the
cells
are
near
each
other
sufficiently
close
to
each
other,
that
you
could
grow
a
synapse
between
them.
So
there's
a
very
small
number
of
people
in
neuroscience
who
study
what
are
called
potential
synapses.
A
These
are
the
how
many
cells
could
this
cell
connect
to
and
those
cells
cannot
connect
to
most
other
cells.
They
can
only
connect
to
some
subset
and
that's
that's
the
biology
it
just
can't
physically
do
it,
but
but
there's
lots
of
reasons
we
wouldn't
want
to
do
it
anyway,
right
so
from
an
anatomical
standpoint,
there
are
constraints
that,
whatever
representation
you
use
has
to
be
robust
to
sub
sampling,
that's
right
and
also
about
robustness
in
terms
of
the
cells
dying
and
connecting
failing.
A
A
Take
the
union
of
them
and
then
query
to
see
if
some
other
pattern
is
in
that
union
or
not?
Okay.
So
let's
say
in
this
case
there
are
10
different
patterns
that
are
just
org
together,
so
you
have
a
new
representation
which
may
not
be
sparse
anymore,
but
it's
got
a
number
of
on
bits
and
now
you
can
take
another
vector
and
say:
okay,
is
it
part
of
that
original
10
or
not?
You
can
query
that
vector
and
the
way
you
do
that
is
again
using
the
same
technique.
A
A
Here,
okay,
so
what
I
thought
I'd
do
is
walk
through
one
aspect.
A
And
I
can
do
that
on
the
on
the
whiteboard
here.
Let's
see
you
pull
it
up,
it's
close
here.
A
A
Okay,
is
this
pattern
in
here
or
not,
and
you're
only
allowed
to
choose
one
out
of
50.
okay?
So
this
pattern,
you
know
what
is
the
chance
of
of
an
overlap
here.
You
know
what
is
the
chance
of
a
false
positive.
So
in
this
case
you
know,
let's
say
this
pattern
is
not
that
original
pattern,
but
what
so?
A
What
is
the
chance
of
a
false
positive
because
you're
picking
these
bits
randomly
well
since
it's
only
one
out
of
50
there's
a
two
percent
chance
that
the
bit
that
corresponds
to
this
pattern
is
the
same
as
the
one
that's
stored.
Okay,
exactly
so
there's
a
two
percent
chance
of
the
false
positive
of
a
mistake
being
made.
A
A
A
Okay
is
that
is
that
pretty
clear.
D
E
A
E
So
I
actually
plotted
this.
This
is
so
this
shows,
if
you
have
50
columns.
A
A
It's
like
never
going
to
happen,
and
you
know,
if
you
can,
you
can
keep
going
if
you're
you
know,
10
12,
15
bits
on
and
the
chance
of
a
false
positive
is
going
to
be
minuscule.
Okay,.
F
And-
and
this
is
for
an
exact
match
now,
if
you
don't
want
an
exact
match,
if
you're
looking
at
you
know,
let's
say
a
99
match
or
a
90
match,
these
numbers
still
potentially
they're
going
to
be
higher
chance
of
a
false
positive,
but
it
still
goes
potentially
okay.
So
in
order.
A
A
A
A
Okay,
so
this
new
representation
contains
both
of
these
patterns
work
together.
Okay,
so
now
I
can
take
a
new,
a
third
pattern
and
see:
okay.
Is
it
does
it
correspond
to
this
or
not?
So
the
question
is
either:
how
many
of
these
patterns
can
you
store
into
this
representation
without,
while
still
having
a
very
small
chance
of
a
false
positive
occurring?
A
Okay,
so
I
plotted
that-
and
this
shows,
if
you
have
one
bit
on
at
a
time
two
bits
on
at
a
time
four
bits
on
at
a
time
ten
bits
on
at
a
time.
Oh
I'm,
sorry!
What
did
I
do?
A
This
is
sorry.
This
is
the
number
of
bits
on
at
a
time
that
you
have,
and
this
is
the
number
of
patterns
you
store
and
then
that's
the
chance
of
a
false,
positive
okay.
So,
as
you
might
expect
as
you,
if
you,
if
you
have
a
large
number
of
bits
that
are
on
okay,
the
number
of
patterns
you
can
store
by
oring
them
together
without
getting
false
positives,
it's
kind
of
low.
A
Okay,
but
if
you
just
have
a
small
number
of
bits
on
then
the
chances
of
mult.
You
know
multiple
patterns
having
the
same
bit
on
is
is
lower,
so
you
can
store
more,
but
there's
a
trade-off
here.
Okay,
so
the
number
of
patterns
you
can
store
at
a
time
by
ordering
things
together
is
pretty
low,
with
50
columns
here,
okay,
so
the
answer
to
this
is
just
increase.
A
The
number
of
columns,
okay,
and
so
what
I've
shown
here
is
suppose
you
have
a
thousand
columns
and
again,
the
x
axis
is
the
number
of
bits
on
at
a
time
yeah,
and
at
that
point
in
time
the
number
of
patterns
you
can
store
grows
exponentially.
A
Okay,
so
there's
another
exponential
here
and-
and
that
is
the
number
of
patterns
you
can
store
without
having
a
false
positive,
and
so
here,
if
you
have
a
thousand
columns,
if
you
have
eight
bits
on
at
a
time,
you
know
with
ten.
If
you
store
ten
patterns
in
there,
the
chance
of
a
false
positive
of
an
exact
match
is
again.
You
know
really
small
and
you
can
play
with
these
numbers.
A
It
exponentially
grows
with
the
number
of
columns
and,
and
the
chance
of
you
know
any
any
false
positive
also
grows
as
you
increase
the
number
of
on
bits
up
to
a
certain
point.
A
A
Is
okay,
so
what
we're
doing
is
the
way
we're
storing
patterns
in
this
one
fixed
that
fixed
width
vector,
is
by
oring
multiple.
A
So
and
the
way
we
detect
whether
in
this
case
the
way
you
detect
whether
the
pattern
is
in
there
or
not,
is
you
take
a
new
pattern
and
you
count
the
number
of
shared
bits,
so,
let's
say
you're,
storing
nine
bits
per
pattern
and
if
the
number
of
shared
bits
in
this
word
representation
is
nine,
then
you
say:
okay.
This
pattern
is
in
my
set
in
my
union
that
I've
stored
okay.
So
if
it
will
it
will
never
give
a
false
negative.
A
Patterns
together,
you're
always
going
to
get
nine
bits
that
match
right
because
they're
in
there.
But
what
the
problem
that
can
happen
is
a
false
positive
is
you
can
get
another
pattern?
That
is
not
that
any
of
those
in
there,
but
now
because
you've
worked
those
bits
together,
you
have
many
more
bits
on.
You
could
have
a
false
match.
A
So
that's
the
error
that
you
you
need
to
focus
on
is
the
false
positive
in
in
this.
In
this
case,
okay,
that's,
okay!
So
this
is
so
what
if
the
numbers
work
out
right?
A
Okay,
so
if
you
have
a
large
enough
number
of
columns
and
a
reasonable
number
of
on
bits,
it
turns
out
that
you
can
store
a
reasonable
number.
You
can
order
together
a
reasonable
number
of
patterns
and
actually
still
be
able
to
retrieve
them
or
be
able
to
detect
whether
they're
in
that
set
extremely
reliably.
The
chance
of
a
match
is
very,
very
low.
A
Okay
and
this
this
property
of
being
able
to
take
the
union
or
the
superposition
of
lots
of.
A
And
being
able
to
still
say
whether
answer
reliably,
whether
that
pattern
is
in
there
or
not,
is
extremely
important
to
the
cla
and
the
fact
that
with
sdrs
you
can
do
it
with
a
fixed
representation
is
really
advantageous
because
everything
after
that
point
can
just
it
doesn't
need
to
know
you
don't
need
dynamically
growing
structures
or
anything
like
that.
Everything
just
works
off
of
that
fixed
representation.
This
is
a
really
important
nice
property
of
sdrs
yeah.
A
A
Number
of
neurons
and
some
percentage
of
them
are
on
so
that's
an
sdr.
It
also
works
at
the
level
of
segments,
so
there's
a
number
of
synapses
that
happen
to
be
on
and
how
they
map
to
the
columns
that
you
know.
There's
these
properties
coming.
F
A
In
the
biology
you've
got
a
set
of
cells,
that's
your
fixed
representation.
The
cells
are
either
active
or
not
active.
We
always
find
everywhere
in
the
brain
that
you
have
very
few
cells
that
are
very
accurate.
Most
of
those
are
relatively
inactive,
so
you
have
this
sparse
activation,
but
what
subutai
was
saying
is
that
you
don't
get
the
state
of
these
patterns
from
some
buffer
or
some
linked
list
or
something
like
that.
They
have
to
be
all
in
the
same
codes
yeah.
So
the
same
set
of
cells
are
representing
all
the
different
things.
A
Different
states
of
the
system
at
any
point
in
time
that
same
set
of
cells
can
be
representing
multiple
predictions
at
the
same
time,
yeah,
and
so
that's
that's
it
that's
part
of
the
key
to
how
all
that
works,
and
it's
just
this
fixed
set
of
resources
that
are
used
over
and
over.
That's
right,
and
some
of
you
are
really
interested
in
the
math
behind
this
and
stuff.
I
would
love
to
like
sort
of
develop
this
part
of
the
theory.
A
More
canerva
has
done
a
bunch
of
work
on
on
this,
and
so
you
know
I
definitely
encourage
reading
his
papers
and
his
books.
I
think
he's
done
some
of
the
initial.
You
know
some
of
the
mathematical
foundations
he
uses
are
are
applicable
here,
not
everything,
but
that's
some
of
it.
So,
let's
just
let
me
just
get
one
more
question.
A
You
said
that
envision
right,
the
sdr
individual
bits
they
have
meaning
with
respect
to
other
proximate
bits
right.
So
in
this
case,
in
in
case
of
this,
and
even
if
we
get
false
positive
in
the
union,
will
it
be
some
relevance
to
the
yeah?
That's
a
really
good
question.
So
the
what
I
you
know
mention
here
is
just
randomness
and
looking
for
an
exact
match.
A
You
know
what
that
means
is
that
if
you
have
similar
inputs,
you're
going
to
have
similar
outputs
and
so
of
course,
the
chance
of
a
false
positive,
there
is
totally
different
than
what
I
went
through,
but
it
may
not
be
a
bad
false
positive.
You
know:
you'll
have
false
positives
with
similar
things.
That
may
not
be
a
bad
thing.
A
If
you
have
your
numbers
right,
you
can
store
completely
separate
patterns
in
the
same
vector
with
minimal
chance
of
false
positives,
okay
and
there's
basically
two
variables
that
are
relevant
here.
You
have
the
number
of
bits
that
are
on
and
the
number
of
columns
or
the
dimensionality
of
the
vector.
Those
are
the
two
numbers
that
you
play
with
and
that's
the
sparsity
level.
A
A
You
could
have
very
low
sparsity
like
point
zero,
one
percent
extremely
low,
in
which
case
you
could
store
potentially
a
lot
of
this,
but.
E
F
E
Seem
to
match
seem
to
have
really
nice
properties
on
this
point,
so
in
new
pic,
we've
chosen
40
out
of
2048
that
that
gives
us
about
two
percent
sparsity
and
one
simple
coding
exercise
is
just
try
to
figure
out
how
many
random
patterns
can
you
store.
A
E
A
Or
you
could
try
to
figure
out
analytically,
which
would
be
nice,
but
just
you
know
it's
easy
to
code.
This
up,
you
just
sort
create
random
patterns
together,
create
another
one,
whether
you've
got
a
false,
positive
or
not
and
just
repeat
and
see.
If
you
can
come
up
with
plots-
and
you
know
that's
something
you
can
try
doing
if
you
don't
have
anything
else
to
do
so.
This
is
this
is
just
a
simple
coding
exercise.
You
can
do
okay,
any
more
questions
than
that.
A
J
A
A
Wouldn't
work
and.
E
The
way
you
you
have
to
again
have
the
number
of
on
bits
and
the
number
columns
and
all
that
in
a
reasonable
space
so
that
you
can
actually
do
it.
So
if
you
go
back
this
exercise
here,
this
will
actually
tell
you
how
many
patterns
attempt
the
temporal
pool,
how
many
random
patterns
the
temporal
pooler
might
be
able
to
predict
simultaneously
without
the
chance
of
a
false
positive.
So
this
is
not
just
some
abstract
exercise.
E
E
And
let
me
try
to
explain
how
that
might
happen,
and
you
might
have
a
lot
of
questions
about
that.
Let
me
see
so
that
you
know
why
do
I
have
all
these
okay?
So
in
a
hierarchy
you
have
a
level
that's
speeding
to
another
level,
and
the
output
of
this
level
is
the
number
of
cells
that
are
active
at
any
point
in
time.
So
this
is
again
the
state
of
the
the
temporal
pooler
and
that's
fed
into
into
the
next
level.
E
A
A
The
images
move
or
the
face
rotates
or
lighting
changes
or
whatever
you
know
the
the
cell
is
going
to
stay
long.
So
the
you
will
have
a
the
cell's
going
to
be
slower
when
at
the
higher
level
of
the
hierarchy.
L
L
A
Okay,
so
this
slowness
is
a
property
of
the
hierarchy.
Now
it's
extremely
easy
to
get
slowness.
If
that's
all
you
want,
you
can
just
keep
cells
on.
You
know,
there's
no!
You
know
it's
very
easy
to
keep.
You
know
get
slowness,
but
what
you
want
to
do
is
as
you,
you
want
to
have
things
that
are
slower,
but
also
discriminative
of
the
input
that
phase
detector
cell
is
not
going
to
detect
chairs
it's
only
going
to
be
on
for
faces.
A
And
so
again
I
think
the
superposition
of
the
ability
to
be
able
to
take
a
union
of
different
inputs
and
being
able
to
maintain
the
discriminability
at
the
same
time
is
an
absolute
necessity.
If
you
want
to
have
a
hierarchical
representation
that
works
well
so
again,
the
sdr
and
that
the
the,
if
you
have
the
numbers
right,
you
can
get
this
property,
and
this
is
absolutely
key,
all
right.
A
A
Okay,
so
the
output
of
a
set
of
a
of
a
level
is
actually
this
a
combination
of
those
two,
it's
you
know
all
the
different
things
that
might
happen
in
the
next
step,
as
well
as
multiply
by
all
the
things
that
might
happen,
multiple
steps
into
the
future
okay,
so
this
little
spatial
temporal
concept
is,
is
represented
all
in
one
sdr
and
I
don't
know
of
any
other
representation
that.
L
L
A
Just
saying
that
you
might
want
to
clarify
a
little
bit
about
how
we
have
implementation
yeah.
I
thought
it
glossed
over
yeah
that
here,
but
you
know
in
the
cla,
if
you
look
at
the
white
paper,
there's
the
neurons
the
weight
there's,
a
mechanism
by
which
you
can,
and
so
you
know,
sometimes
called
a
pooling.
A
You
know
a
cell
is
predicting,
you
know,
will
become
active,
not
not
just
to
predict
the
current
input,
but
also
look
one
step
back
at
what
happened
at
the
previous
time
step
or
two
steps
back
and
and
try
to
become
active
when
that
comes
on.
If
you
do
that,
recursively,
you
get
cells
that
are
predictive
further
and
further
steps
into
the
future.
H
H
Yeah,
I
think,
just
to
clarify
in
his
comment
that
this
the
term
temple
cooler
came
from
this
property.
That's
why
we
call
it
it's
pooling
over
time.
It's
basically
the
cells
are
staying
at
all
time,
but
I
think
what
ian
was
trying
to
say
is
that
the
current
implementation,
if
you
just
install
new
pick
and
run
with
it,
this
property
is
not
enabled
right
from
the
start
and.
A
E
Well,
explored
by
us,
we've
tried
it
a
little
bit
and
I
can
talk
about
what
we've
done.
Yeah
there's
definitely
a
lot
of
room
for
research
and
exploration
here
and
creating
hierarchies,
but
okay.
So
that's
the
sort
of
the
first
part
of
the
talk
just
I
just
wanted
to
talk
about
one
concept
in
depth
because
I
think
it's
really
cool
and
we
haven't
really
talked
about
it
much
and
it's
sort
of
critical
to
sdrs
and
getting
the
whole
thing
working.
So
hopefully,
that's
been
helpful
and
I'm
open
to
any
questions.
E
E
A
To
lines
that
are
that
are
moving
in
directions
like
across
the
screen
and
they're
they're
sensitive
to
like
a
line
moving
from
left
to
right
in
a
certain
part
of
visual
field
cell
will
stay
active.
That's
that's
an
instantiation
of
this
mechanism.
We
believe
that
a
cell
has
learned
that
hey
I'm
representing
a
line,
and
I
can
predict
that
the
line's
going
to
get
to
me
so
I'm
going
to
stay
active
throughout
that
entire
sequence
until
it
becomes
unpredictable
and
it
becomes,
it
can
come
unpredictable
for
various
reasons.
A
A
Well,
let's
put
it
this
way.
The
cla
today
is
an
inference
engine.
It
doesn't
actually
we
can
playback
sequences,
but
it's
as
it's
implemented.
M
A
Really
just
inferring
sequences
and
if
you
want
to
play
them
back
like
like
to
make
a
motor
behavior
like
my
speech
right
now,
which
I'm
playing
back
a
sequence
stored
in
this
fashion,
and
to
do
that
I
have
to
have
very
specific
timing
and
I
have
to
be
you
know.
I
have
to
turn
on
my
neurons
and
my
muscles
in
the
precise
time
of
certain
delays
between
different
activations-
and
we
don't
have
any
mechanism
in
here
at
all.
For
that,
that's
what
I
was
referring
to
earlier,
there's
no
specific
timing.
A
The
cla
can't
recognize
a
melody
based
on
its
rhythm.
Today,
you
can
only
recognize
it
based
on
its
it's
the
sequence
of
notes,
and
although
those
that
got
those
team
that
did
this
in
the
last
hackathon,
they
did
a
cheat.
So
it
looked
really
cool,
but
was
a
clever
cheat,
but
that
mechanism
is
not
in
there
and
I
have
speculations
about
how
specific
timing
works,
but.
J
A
I've
forgotten
a
lot
since
we
wrote
the
white
paper,
well,
I'm
around
too
yeah
yeah.
So
I
was
trying
to
ask
jeff
about
that
before,
and
he
actually
pointed
me
to
you
because
you
know
a
lot
about
cla,
and
so
the
question
is:
is
there
a
way
to
hybridize?
You
know
pattern,
learning
and
pattern.
Recognition
with
the
unsupervised
classification
so
basically
can
is
a
way
to
make
cla
learn.
A
You
know,
classes
of
patterns
and
in
an
automated
way,
yeah
like,
or
is
there
a
good
way
to
cluster
the
cla
representation
somehow
and
yeah
so
and
maybe
make
a
next
level
of
hierarchy
based
on
those
you
know
on
the
names
of
the
patterns,
basically,
rather
than
on
the
on
the
patterns
themselves,
you
know
okay,
so
there
are
a
few
different.
E
Ones,
one
is,
you
know
you
could
be
asking
about
invariance
and
you
know
whether
we
learn
whether
you
can
use
the
cla
to
learn
invariants
in
there.
So
this
is
goes
back
to
the
edge
example
and
the
face.
So
you
want
the
cla
to
have
a
very
similar
output,
regardless
of
whether
the
edges
are
shifted
or
whether
the
faces
are
shifted
or
not.
So
that's
one
type
of
clustering.
E
If
you
will
then
there's
kind
of
the
more
traditional
machine
learning
type
of
clustering
and
jeff's
dotting,
so
that
one
is
that
what
your
question
is
about:
yeah,
okay,
so
that
wouldn't
apply
to
the
it's,
not
something
you
would
feed
to
the
next
level
of
the
hierarchy.
It's
just
given
the
cla.
You
wanted
to
apply
clustering
to
it
and
that's
very
doable
so
with
clustering.
E
You,
the
main
thing,
is
you
want
an
input
to
the
clustering
where
there's
a
similarity
metric
right,
and
so
you
want
to
have
inputs
if,
if
two
patterns
are
similar,
you
want
the
representations
to
be
similar
according
to
some
metric
and
as
long
as
you
have
that
you
can
apply
any
traditional
clustering
to
it.
So
you
can
actually
take
the
output
of
the
s
of
the
spatial
cooler
or
the
tempo
cooler.
E
They
both
satisfy
their
property
and
then
run
clustering
on
that
and
I've
actually
done
that
in
the
past.
I
did
that
with
the
energy
data.
For
example,
I
ran
energy
streams
through
the
cla
and
looked
at
the
columns
that
were
activated
and
then
I
just
ran
a
traditional
like
k-means
clustering
or
something
on
the
on
the
up
on
that.
There
was
a
very
high
dimensional
space,
but
you
can
still
run
it.
E
E
So
there
were
some
buildings
that
had
these
are
gyms,
that's
where
one
of
the
hot
gym
examples
come
from,
but
there
was
a
gym
that
there
were
some
gyms
that
had
swimming
pools
and
they
had
a
much
higher
energy
usage
or
a
very
different
pattern,
and
there
were
some
gyms
that
didn't
have
swimming
pools
and
there
were
some
other
characters.
I
forget,
but.
E
The
same
stream
of
data
and
two
different
discrete
states-
you
know,
like
you
know,
day
and
night,
could
you
somehow
by
clustering
infer
a
switch
between
those,
are
two
different
different
patterns
and
that
there
is
a
switch
occurring
between
them
with
you
know
certain
predictable
or
unpredictable?
I
know
period.
You
could
do
that
so.
A
That
would
you
know
you'd
need
some
mechanism
to
know
when
one
ended
and.
E
The
next
begin,
or
you
could.
E
On
you
know,
so
you
get
one
vector
output
per
timestamp
and
then
feed
the
whole
thing
into
a
clustering
system
and
just
see
what
clusters
come
up
and
see
whether
they're
clustered,
similarly
all
right.
So
if
there's
any
similarity
in
the
patterns,
if
you
do
it
at
the
level
of
the
spatial
cooler,
you'll
get
kind
of
instantaneous
or
static
similarity,
but
if
you
do
it
from
the
output
of
the
tempo
cooler,
you
might
actually
pick
up
similarities
and
sequences,
which
would
be
very
interesting
yeah.
E
So
we
have
a
question
from
the
irc
channel.
This
is
rick
asks
you
talked
about
how
many
str's
you
can
store
when
you
have
40.
A
Bits
on
out
of
2048-
and
he
understands
the
probability
of
false
positives-
goes
up
with
more
sdrs
with
more
str
bits.
However,
the
the
question
is
about
you
assume
that
the
false
pos,
false
positive
probability
remains
below
some
threshold
and
what
is
that
threshold?
So
how
do
you
decide
what
is
a
good
threshold,
and
where
do
you
come
up
with
that
number?
A
So
the
answer
is.
What
is
the
question
is
what
is
the
what
right
threshold
of
error
yeah
for
error?
I
believe
so,
I
imagine
that's
somewhat
application
dependent
and.
A
So
one
of
the
things
that
errors
in
the
cla
are
are
kind
of
funny,
because
there
are
never
any
hard
errors
as
you
as
you
overfill
the
system
in
many
different
ways,
if
you're
over
training
or
you're,
due
to
union
of
too
many
patterns.
What
will
happen
is
you'll
start
it'll
start
over
generalizing.
So
what
so?
C
A
How
bad
is
that?
Well,
it
may
not
be
bad
at
all,
as
super
tight
said
earlier,
maybe
actually
what
you
want,
and
so
what
you
basically
lost
is
the
discrimination
ability
to
discriminate
between
two
things
that
are
suddenly
they're
subtly
suddenly
different.
You
know
so
like
so
things
start
blurring
together.
After
a
while,
you
start
you
start
being
able
to
yeah.
That's
just
like
a
bunch
of
other
stuff
I've
seen
before,
and
I
really
can't
tell
apart
and
I'll
start
generalizing
saying.
Well,
it's
just
like
these
other
things.
So
it's
not
usually
a
heart.
C
A
A
I
don't
know
cabbage,
and
that
has
a
cluster
here
and
some
other
cluster
here
now.
If
you
see
cat
again
and
it's
followed
by
broccoli
in
sep's
representation,
they're
going
to
be
very
similar,
so
you
know
you
might
have
you
know
a
slightly
bigger
cluster
here
and
maybe
a
smaller
cluster
here,
and
so
once
you
train
the
system
with
these
sequences.
What
the
temporal.
C
Cooler
will
start
to
do
is
predict
the
superposition
of
these
of
these
patterns
so
next
time
it
sees
something
that
looks
like.
D
A
C
A
Different
you're
going
to
get
something:
that's
that's
reasonable!
Yeah.
I
was
wondering
if
you
had
played
with
having
sort
of
multiple,
not
columns
but
sets
of
columns
wired
to
the
same
input
with
different
sparsities,
like
maybe
it'd,
be
useful
to
have
more
generalization
or
less
generalization
on
the
same
input
and
then
kind
of
wire.
Those
together.
That
makes
sense
that
you're
sort
of
over
generalizing
by
having
a
higher
density
and
sort
of
under
generalizing
by
having
low
density,
then
having
making
predictions
with
both
things
at
the
same
time
might
be
yeah
yeah.
A
And
so
you
know
you
might
see
something
very
similar
here.
There
might
be
some
bits
that
are
cat
and
tiger,
and
so
on,
but
they're
not
as
relevant
to
making
this
prediction.
You
know
those
bits
will
eventually
die
out
and
you'll
get
more
higher
density,
naturally
within
the
areas
that
actually
correspond
to
the
signal.
In
that
sequence,.
A
This
is
sort
of
more
an
advanced
topic,
but
if
you're
following
okay,
so
the
id
you
know
tibetan
is
a
great
topic
to
talk
about
the
density
of
unions
and
so
on,
and
we
talked
about
how
you
can
fail
if
you
have
too
many
active
and
so
on,
but
in
the
brain
there's
something
that's
going
on
that.
We
do
not
do
today,
and
this
could
be
a
very
interesting
I'm
not.
You
could
get.
C
A
In
a
day's
hack
you
might
be
able
to,
but
it's
certainly
someone
could
work
on
what
we
underlying
all
the
predictions
is
that
each
of
the
cells
dendrites
have
these
synapses
on
them
and
there's
a
threshold.
If
there's
a
number
of
synapses
active
over
a
certain
threshold,
let's
say
10,
then
that
dendrite
becomes
active
and
that
cell
gets
be
in
a
predictive
state.
A
F
E
And
if
a
citizen
wasn't
predicting
enough,
you
could
lower
the
threshold
and
get
more
predictions.
So
if
you
were
looking
at
a
sequence
and
all
of
a
sudden
there
really
wasn't
any
good
prediction
coming
out
of
it.
There's
not
maybe
nothing.
He
says.
Look,
you
know
this
is
bad,
you
might
say.
Well,
I
want
to
force,
make
a
prediction:
just
do
anything
you
would
lower
the
threshold
until
you
got
some
predictions
and
then
you
could
go
with
that.
I
believe
this
is
what
happens
when
I
made
this.
E
I
made
this
comment
in
talks
that
if
someone
says
hey,
do
you
see
that
you
know
animal
in
the
cloud?
Well,
there's
no
animal
in
the
cloud,
it
doesn't
look
like
the
dog,
but
if
you
lower
the
threshold
and
it'll
start
trying
to
make
predictions
and
eventually
you'll
see
the
dog
pop
out,
so
that's
an
interesting
thing
we
haven't
done
probably
need
to
do
some
time.
It's
related
to
all
the
stuff
that
I
was
talking
about.
That's
pretty
tricky
research
topic,
but
it's
really
might
not
be.
Maybe
you'd
implement
it
pretty
quickly.
E
Yeah,
the
inhibitory
cells
usually
spread
over
some
regional
area,
and
so
you
would
be
looking
at
the
cumulative
prediction
of
a
some
area.
So
if
there's
too
many
cells
in
a
predictive
state,
it
could-
or
you
know
active,
if
you
will
it
would,
it
would
tune
them
down.
If
there's
too
few,
it
could
tune
them
up
type
of
thing,
yeah.
That
makes
it
easier
to
do
it
from
a
code
standpoint,
because
we
have
a
single
threshold.
You
can
actually
change
that
for
the
entire.
A
And
you
could
change
that
on
an
iteration
by
iteration
basis
and
you
could
play
around
with
this
and
another
thing
which
I
thought
I'd
just
throw
out
here.
You
danced
around
a
bit
in
your
talk,
but
but
just
stated
explicitly,
someone
asked
earlier
about
the
topology
one
of
the
nice
things
about
the
cla,
and
these
properties.
We're
talking
about
is
that
they
can
all
be
implemented
with
local
rules.
So
if
you
had
a
very,
very
large
region
like
a
million
columns
or
something
which
is
not
very
large
for
a
brand,
it's
pretty
small
actually.
K
So
you've
got
a
million
columns.
You
don't
want
to
have
all
the
columns
talking
together,
you'd
use
topology
and
pretty
much
you
get
almost
all
the
exact
same
properties,
if
you
just
local,
make
local
connections
and
local
inhibition
and
local
rules
and
brains
have
to
do
this,
of
course,
because
cells
can't
connect
very
long
distances,
they
have
to
connect
when
you.
K
K
A
You
know,
I
think,
so
what
should
that
threshold
be?
You
know
it
can
be
dynamic
and
we
have
it
set
to
some
static
thing,
and
I
forget
what
number
we
have
it's
set
to
like
12
or
15,
or
something
like
that,
but
we
have
a
actually.
I
think
we
have
a
range
that
we
swarm
over
typically,
but
that
range
has
to
be
chosen
so
that.
A
You
know
then
you're
going
to
have
too
many
false
positives,
because
that,
basically
that's
that
what
that
threshold
says
is
that
if
you
have
a
pattern,
should
I
predict
this
particular
pattern.
Next,.
M
Or
not,
and
if
the
threshold
is
too
low,
then
you're
going
to
do
a
lot
of
false
predictions
and
if
it's
too
high,
then
you
know
you
might
be
too
sensitive
to
the
noise
or
you
might
you
might
just
slow
down,
it
might
be
unnecessarily
high.
So
you
know
how
you
that
number,
that
you
set
that
segment
threshold
to
has
to
be
done
with
you
know
this
property
in
mind,
same
thing
with
our
encoder
bits
and
we
set
a
reasonable
number
of
on
bits
per
field.
M
It's
all
because
of
this,
so
is
the
the
size
of
a
region
in
terms
of
number
of
columns,
identical
to
the
bit
length
of
the
vector
or
not
so
the
output
of
the
spatial
cooler
is,
you
know,
2048
columns
and
some
number
40
will
be
on
typically
in
our
in
our
settings,
so
that
would
be
one
of
the
vectors
and
then
the
output
of
the
tempo
cooler.
M
You
have
multiple
cells
per
column,
so
the
output
of
that
vector
and
what
the
output
that's
actually
fed
up
to
the
next
level
would
be
the
number
of
columns
multiplied
by
the
number
of
cells
per
column
right.
So
it's
a
huge
it's
a
pretty
big
vector
but
say
if
you
would
have
a
region
of
a
million
columns.
Would
you
still
talk
about
a
20
48
bit
vector,
or
does
it
become
a
million?
No,
there
would
be
a
million.
M
If
you
have
a
region
with
a
million
columns,
then
it's
a
million
and
then,
if
you
have
10
cells
per
column,
the
output
from
the
temporal
pool
will
be
10
million.
Okay.
So
if
you
talk
about
a
region
with
a
number
of
columns
that
automatically
tells
you
the
size
of
the
vector
and
then
the
sparsity
tells
you
how
many
ones
there
are
now
you
probably
don't
want
to
have
such
a
huge
region.
It
might
be
just
you
know.
M
K
And
another
question
is
the
so
the
collapsed
10
ored
vectors
that
become
one?
Is
that
then.
K
A
E
It
has
some
that
it
predicts
right
and
some
that
are
predicted
wrong
yeah,
so
the
output
of
the
temporal
cooler
will
be
all
the
prediction
of
all
the
different
patterns
that
could
happen
next.
So
if
you
have
a
sequence,
you
know
a.
E
The
patterns
of
b
and
c
are
those
than
those
those
10
bits
that
were
40,
but
you
sort
of
yeah.
Okay.
Actually
let
me
okay.
This
is
a
little
more
involved,
so
you
have
2048
columns.
A
Are
going
to
represent
the
temporal
context
for
that
sequence
right
so
this
is,
I
don't.
I
didn't
explain
that
properly
at
all.
This
actually
takes
a
little
while
to
explain,
and
if
people
want
we
can,
you
know,
walk
through
the
operation
of
the
temporal
cooler
and
exactly
what
this
means.
That
might
be
better
done
in
the
smaller
yeah
and
I'm
happy
to
do
that
exactly.
You
know
how
we
construct
the
cells
that
come
on
and
how
we
predict
higher
high
order
sequences
and
do
all
of
that
great
thanks.
A
A
K
Can
distinguish
between
repeated
elements
in
these
different
sequences
that
at
any
point
in
time,
you
have
to
be
able
to
tell
me
what
the
input
is,
and
you
also
have
to
be
able
to
represent
it
uniquely.
So
there's
always
going
to
be
place
in
the
brain
where
you
have
the
same
pattern
coming
in,
but
a
unique
pattern
coming
out
and
I
need
to
be
able
to
go
backwards
too.
K
E
E
Input
and
a
unique
representation
and
a
unique
representation,
a
common
input
which
we're
doing
in
the
columns
and
the
cells.
The
columns
are
the
column
input
common
input.
The
cells
are
the
unique
representation,
and
this
idea
that
you
go
back
and
forth
it's
an
absolute
requirement
for
any
system
like
this,
and
it
was.
E
It
wasn't
obvious
how
to
do
this
at
first,
but
in
the
end
it
turned
out
pretty
nicely,
and
that
gives
us
some
confidence
that
this
is
probably
what's
actually
going
on
the
brain,
and
it
does
take
lots
of
several
repetitions
to
really
grok
that
it
takes
a
little
while-
and
it's
kind
of
this
question
here
that
we
often
go
through
in
the
in
the
quiz.
You
know
you
learn
these
sequences.
Now
you
present,
you
know
part
of
a
sequence.
E
What
happens
next,
you
know
what
is
predicted
and
how
is
that
represented
in
different
situations,
so
yep,
so
the
output
of
the
temporal
pooler
is
all
the
active
cells
right
that
got
activated
because
they
were
in
a
predictive
state
or
bursting
and
ored
with
all
the
prediction,
all
the
cells
in
a
predictive
state
right,
that's
the
first.
I
don't
know
the
predictive
status.
A
A
They
also
predict.
Could
you
repeat
that
last
sentence
about
how
they
the
the
way
you
determine
which
cells
are
predicted,
is
by
looking
at
which
cells
are
active?
Currently,
so
it's?
Actually
you
don't
it's
redundant.
You
don't
need
both
okay
yeah
just
to
explain
how
this
works.
Okay,
when
you.
G
These
cells
that
are
predictive
now
as
a
result
of
the
activity,
that's
at
t
equals
zero.
Okay.
So
the
next
step
that
happens
is
you
get
input
and
currently
they
don't
do
it,
but
that's!
What
I
did
yesterday
is
the
input
is
added
to
the
predictive
state
right,
because
the
voltage
coming
in
from
the
prediction
and
there's
a
voltage
coming
in
from
the
input,
so
the
ones
that
have
the
biggest
total
they
fire.
Okay.
So
it's
either
going
to
be
a
or
b
and
70
percent
of
the
time
it's
gonna
be
b.
G
G
G
This
is
a
great
question
and
this
is
a
problematic
part
of
the
theory,
because
although
we
can
make
it
work
well,
I
don't.
I
can't
get
it
to
match
the
neuroscience
exactly
and-
and
I
think
it's
more
than
we
should
do
as
a
group.
I
think
we
should
just
pull
this
offline.
G
If
you're
really
interested
in
this
piece
of
thing,
there's
some
it
gets
pretty
detailed
and
but
the
the
way
we
think
about
it
now
is
that
the
all
the
cells
that
are
firing-
those
are
the
cells
that
are
currently
active
plus.
Those
are
the
ones
who
are
in
temporal
pooling
state,
that's
the
output
of
the
next
level.
What
fergo
was
talking
about
is
the
cells
that
are
the
predicted
cells
are
just
depolarized.
G
No
one
knows
that
except
the
cell
themselves,
and
that's
the
prediction
for
the
next
step
in
time,
and
all
this
actually
seems
like
it
should
work.
Well
as,
but
I
said,
there's
some
there's
some
problems
with
doesn't
quite
match
the
neurophysiology,
and
so
I'm
not
happy
with
it
and
I'm
not
willing
to
say
yes,
this
is
the
answer.
There's
something
weird
going
on.
I
don't
understand,
I'm
happy
to
talk
about
it,
but
not
maybe
not
in
this
big
yeah.
G
G
The
input
is
connected
to
the
columns,
yeah
and
but
those
connections
are
those
fixed
or
are
those
learned
or
this
is
in
the
spatial
puller,
so
yeah,
so
the
each
column
in
the
spatial
cooler
connects
to
some
percentage
of
the
inputs
below,
and
so
that's
another
random
sampling
and
the
connection.
Each
connection
there
is
is
represented
by
permanence,
and
that
is
learned
and
the
impact
that
it
has
on
inference
is
either
it's
going
to
be
above
a
threshold
in
which
case
is
connected
or
it's
not
connected,
but
that
is
a
learned.
G
Part
of
that
is
learned
yeah,
but
is
it
so
is
that
for
the
cells
that
are
within
columns
that
you
have
a
permanent
or
also
between
okay,
so
both
in
inputs
and
the
columns
yeah,
so
the
spatial
pooler?
You
know
the
it's,
you
know,
columns
that
connect
to
the
inputs
and
so
their
permanence
is
there
and
then
in
the
temporal
cooler.
You
have
multiple
cells
per
column
and
they
have
connections
laterally
to
other
cells
within
that
within
the
temple
cooler.
So
those
are
also
learned
and
we
can
talk
about
the
biological
mapping
for
that.
G
G
G
Sorry,
I
would
do
you
have
an
answer
for
this
I'll.
Go
ahead.
No!
No!
No!
No!
You
usually
give
better
answers,
but
do
you?
I
don't
know
if
you
had
an
answer
for
this,
why
don't
you
give
the
neuroscience
answer
and
I
can
give
you
okay?
I
would
argue
that
you
actually
don't
have
a
very
good
probability
of
things.
You're
gonna
predict.
Next,
the
example
I
use
all
the
time
is:
is
you're
constantly
predicting
what
what
you're
gonna
hear,
what
words
you're
gonna
hear
and
so
on.
G
So
if
you
listen
to
someone
speak,
your
brain
is
constantly
predicting.
We
know
that
because
if
they
say
something
really
odd,
then
you
know
that
was
wrong,
but
but
there
are
many
things
you
could
hear
at
any
moment
in
time
and
I
think
it's
actually
challenging
to
say.
Oh,
I
have
a
probability
of
knowing
what
is
the
most
likely
word
and
what's
the
next
most
likely
word
and
so
on.
G
In
fact,
I
think
most
of
the
time
you
actually
you're
not
conscious
at
all
about
what
you're
predicting
it's
not
you're,
not
even
aware
of
it,
and
it's
only
if
I
sit
and
say
now
give
me
a
prediction:
you
and
you
do
something
to
do
that.
G
Then
that's
when
you
might
get
a
probability,
but
the
kind
of
predictions
we're
talking
about
here
are
ones
that
there's
actually
no
activity
in
the
cells
to
be
external
to
the
cell,
and
so
you
really
don't
know,
there's
no
conscious
and
there's
no
conscious
ability
normally
of
what
you're
going
to
see
or
hear
or
feel
it.
Just
it
just
happens,
and
you
just
know
it
it's
right,
but
you
know,
but
I
don't
deny
that
you
can
say
well
give
me
a
likelihood
of
what
word
follows.
G
You
know
this,
but
I
I'm
not
sure
we're
not
trying
to
capture
that
here.
I
think
it's
beyond
what
we're
doing
here.
Yeah.
I
think
the
word
probability
has
a
very
loaded,
meaning
too
it's
it's
a
very
specific
mathematical
concept,
but
there's
there's
also
some.
You
know
other
ways
you
can
get.
G
You
know
kind
of
likelihood
from
the
sdr,
so
if
you
think
about
again
the
sept
representation
could
have
you
know
if
you
see
animal
followed
by
a
vegetable
over
and
over
again
you're,
going
to
see
the
vegetable
areas
really
well
predicted,
but
a
few
times
you
see,
you
know
I
don't
know
steak,
and
so
you
might,
you
know
different
types
of
meat,
but
you
see
it
very
rarely.
G
You
know
that
part
of
the
sdr
is
going
to
be
fairly
sparsely
sampled,
and
so
you
have
some
notion
that
you
know
vegetable
is
much
more
likely
to
happen
than
that,
but
you
wouldn't
really
be
able
to
you
know
if
a
meat
you
know
if,
if
a
particular
type
of
steak
happened
over
and
over
again,
you
wouldn't
be
able
to
distinguish
that
versus
just
you
know:
meat
in
general
not
happening,
you
don't
get
the
exact
probability,
but
you
can
kind
of
get
a
rough
sense
that
yeah
vegetable
is
more
likely.
G
I
also,
I
think,
that's
good,
and
I
think
I
could
tell
you
a
mechanism
by
which
we
could
get
the
cla
to
do
what
you
want
to
do
again
using
thresholds
on
the
dendrites,
but
that's
a
quick
standpoint
since
in
newpick
a
classifier
on
top
of
the
cla.
So
it's
and
that
classifier
tries
to
estimate
the
actual
probability
of
the
next
steps
happening
and
so
we'll
actually
give
you
a
probability
distribution
across
the
values
that
it
expects
to
happen.
One
or
n
steps
into
the
future.
G
Yeah
we're
going
to
take
a
break.
I
want
to
take
a
quick
poll.
Real
quick
for
the
next
session
is
mapping,
machine
learning
and
artificial
intelligence
terminology
to
the
cla,
and
vice
versa,
who
is
planning
on
attending
that?
Okay,
a
lot
okay!
So
we're
going
to
do
it
here,
just
making
sure
do
you
have
slides
at
all
in
okay,
so
we
have
a
wiki
page,
so
I'm
gonna,
I'm
gonna,
set
that
up
and
we're
just
going
to
basically
record
the
audience
and
I
don't
know,
pass
the
mics
around
a.