►
From YouTube: Scalar Encoding (Episode 5)
Description
In this episode, Matt introduces some encoding concepts and talks about encoding scalar values.
- Encoding Data For HTM Systems: http://arxiv.org/abs/1602.05925
- Encoders on HTM Forum: https://discourse.numenta.org/search?q=encoders
Intro music: "Books" by Minden: https://minden.bandcamp.com/track/books-2
A
A
Hello
again
and
welcome
to
HTM
school,
my
name
is
Matt
Taylor
from
Mendte.
Today
we're
going
to
start
talking
about
encoders,
you
can
think
about
encoders
as
roughly
equivalent
to
your
sensory
organs
when
we're
talking
about
HTM
systems.
How
do
we
get
data
from
the
real
world
into
this
SDR
format
that
we've
been
talking
about,
so
that
an
HTM
system
can
actually
process
it?
That's
what
encoders
do
some
examples
of
biological
encoders
are
like
your
retina
or
your
cochlea.
A
These
take
stimulus
from
the
environment
and
somehow
translate
them
into
a
stream
of
STRs
that
are
like
neural
activity
going
into
the
brain.
So
each
one
of
these
hairs
in
your
cochlea
responds
to
some
frequency
range
of
vibrations
that
are
occurring
via
sound
and
as
those
activate
it's
neuron,
it's
turns
into
a
part
of
an
SDR,
that's
passed
into
the
brain
for
processing,
so
we're
going
to
talk
about
scaler
encoders
today
and
we're
going
to
use
a
similar
technique
as
the
cochlea,
although
it's
going
to
be
much
much
simpler
than
the
cochlea.
A
So
let's
look
at
an
example,
so
talking
about
encoding,
scalar
values.
First
of
all,
let's
go
to
NW.
This
is
the
dimension
of
the
SDR
that
we're
going
to
produce
400
bits,
21
of
which
are
on
that's
pretty
simple,
I'll
talk
about
this
bucket
value.
In
a
moment.
We
also
have
importantly,
a
value
that
is
going
to
be
encoded.
That's
what
this
slider
represents
right
now
this
is
50.
You
can
see.
This
is
the
scalar
encoding
of
50.
A
Now,
as
I
move
this
slider
up
and
down
that
scalar
encoding
changes,
you
can
see
how
that
bar
of
on
bits
within
the
representation
is
moving
up
and
down.
We
also,
you
might
notice,
there's
a
minimum
and
maximum
value
here,
so
we
do
have
to
specify
the
min
and
Max
value
to
do
this
type
of
scalar
encoding.
So
I
could
make
this
very
large
and
then
the
encoding
454
as
it
is
right
now
moves
higher
up
or
earlier
in
the
array
that
we're
creating
and
we
can
move
it
just
like.
A
We
did
now
we're
spanning
a
much
greater
value,
so
our
range
of
values
has
increased
quite
a
bit.
We
can
also
change
the
amount
of
bits
we
want
in
our
encoding.
Now
we're
dealing
with
a
much
larger
encoding
of
about
fifteen
hundred
and
eighty
bits,
the
scalar
value
encoding
changes
as
you
would
expect
it.
Let
me
show
you
some
things
as
I
move
this
along
I'm
going
to
turn
this
comparison
switch
on,
so
we
can
see
as
I
change
the
value
the
difference
between
these
values.
So
in
this
case
I've
got
an
encoding.
A
A
So
the
difference
between
the
value
is
basically
that
these
couple
of
bits
moved
from
one
side
to
the
other,
but
they
are
semantically
similar.
They
have
a
significant
amount
of
overlap
here.
So
that's
what
makes
429
similar
to
430
if
we,
if
we
move
up
a
little
bit
farther
like
here,
I
moved
from
432
473,
there's
no
overlap
at
all
between
these
two
representations.
So
there's
going
to
be
no
similarity
at
all
between
those
now
you
have.
A
You
can't
have
some
control
over
this,
so
if
I
make
my
W
bigger
and
that's
essentially
the
bucket
size,
so
I'm
talking
about
buckets
in
a
minute,
then
then
my
overlap
is
generally
going
to
be
bigger
and
I
can
represent.
Differences
between
larger
ranges
like,
for
example,
here's
comparison
of
467
to
509.
The
lower
value
is
the
yellow
bits
and
the
red
bits.
The
higher
value
is
the
red
bits
and
the
green
bits,
but
there's
still
some
overlap
here
between
these
two
values
that
are
pretty
far
apart.
A
So
when
you're
tuning
scalar
values,
you
can
decide
exactly
how
you
want
that
encoder
to
work
on
your
numbers.
So
let
me
refresh
this
so
if
we
want
to
have
a
a
very
small
range
of
values
that
we
want
to
represent
here,
we
can.
We
can
do
that
and
now
from
40
to
100.
That's
where
that's
what
we're
encoding
and
we
can
change
this
bucket
size
so
that
those
are
very
similar,
as
you
can
see
when
I
turn
the
comparison
on
they're
very
similar
between
values
right
or
we
could
make
this
bucket
size
much
smaller.
A
We
could
make
the
overall
dimension
of
SDR
much
smaller
and
get
it
to
a
point
where
we
have
exactly
the
resolution
we
want
when
we're
comparing
value.
So
in
this
case,
the
way
I
have
it
now
1/1
integers
difference
you
know
of
87
to
88
does
not
have
much
overlap
at
all.
Maybe
that's
what
you
need
with
your
data?
Maybe
not.
If
you
want
to
have
a
lot
of
overlap,
you
want
to
make
the
bucket
size
bigger
and
then
you'll
get
a
lot
more
overlap
between
these
values
so
about
the
bucket
size
by
the
way.
A
So
when
we
talk
about
buckets
encoding
I'm
talking
about
this
shape
right
here,
this
is
one
bucket
and
it's
basically
for
the
scaler
encoder.
It's
a
consecutive
array
of
on
bits
of
exactly
21
lengths.
Currently,
because
that's
what
I
have
set
here
so
given
that
we
want
a
bucket
width
of
21,
I
can
fit
380
of
those
in
this
space,
and
you
can
sort
of
see
that
as
I
move,
this
here's
here's
the
first
one
to
grow
by
level
all
the
way
down
to
the
bottom.
A
One
there's
three
hundred
and
eighty
ways
that
I
can
twit
those
consecutive
on
bits
in
this
space.
If
I
make
the
bucket
size
larger,
there's
less
ways
that
we
can
fit
this
in
the
space
right.
If
we
make
the
overall
SDR
size,
larger,
there's
lots
more
ways.
We
could
fit
that
many
buckets
in
the
space.
So
when
we're
talking
about
buckets
we're
essentially
talking
about
a
representation
of
on
bits
where
we
can
represent
different
values
of
1
or
many
values.
So
let's
talk
about
a
periodic
encoding
as
well.
A
So
I'm
gonna
make
this
bucket
a
little
bit
bigger.
So
to
illustrate
this
I'm
gonna
turn
periodic
on,
and
you
might
have
noticed
an
earlier
earlier
in
this
example,
when
I
got
to
the
very
limits
the
minimum
and
maximum
scalar
limits
that
the
representation
didn't
change,
it
stayed
at
the
minimum,
even
though
I
went
much
further
down,
and
you
can
see
that
here,
I'm
gonna
get
I'm
in
the
negative
ranges.
The
minimum
is
zero,
it
doesn't
change
the
representation
it
maxes
out
or
mins
out
at
zero
same
thing.
A
If
I
go
above
100,
it
maxes
out
at
100
any
values
outside
of
the
range
you
specify
it's
clipped
and
is
represented
as
the
min
or
the
max,
but
we
can
tell
it.
We
want
a
periodic
representation,
in
which
case
when
we
get
to
the
maximum,
it
doesn't
clip
it
wraps
around
to
the
front
of
the
array
and
when
you
have
this,
it's
sort
of
a
cyclic
representation.
A
We're
going
to
talk
about
this
when
we
talk
about
date,
encoding
soon
in
the
next
episode,
but
a
data
encoder
when
you
have
hours
from
you,
know:
0
1,
2,
3,
all
the
way
up
to
23
or
1
to
24
midnight,
it's
very
similar
to
1
a.m.
so
the
24th
hour
in
the
day
should
be
semantically
similar
to
the
first
hour
in
the
day.
So
that's
what
we
need,
the
periodic
and
coder
for
so
we
can
represent
that.
So
in
any
case
we're
a
range
of
data
wraps
around
itself.
A
Then
we
want
to
use
this
periodic
aspect
of
the
scalar
encoder,
so
the
minimum
values
are
semantically
similar
to
the
maximum
values.
So
now,
let's
talk
about
another
type
of
scalar
encoding.
This
is
called
the
random
distributed,
scalar
encoder.
It
essentially
does
the
exact
same
thing
as
the
scalar
encoder,
but
it
does
it
and
more
of
a
random
distributed
fashion.
A
So,
instead
of
having
these
buckets
of
consecutive
on
bits
like
we
did
in
the
previous
one,
so
you
might
have
a
whole
section
of
on
bits
to
represent
one
bucket
and
the
RDS
see
that
on
bits,
section
is
hashed
out
and
distributed
randomly
through
the
space.
So
in
this
case
this
is
representing
the
value
500
and
we
can
also
again
change
the
tation
that
we're
creating
here
increased
the
number
of
bits
that
we
want
to
send
it
back
to
what
it
was.
A
But
the
interesting
thing
here
is,
as
we
increase
the
value
505
506
507
508,
you
see
those
bits,
there's
one
bit:
that's
that's
moving
between
these
buckets
so
every
time
we
change
a
value,
and
now
you
can
see
it
really
clearly,
there's
a
bit
that
changes
every
bucket
has
one
bit
difference
between
its
left-hand
and
right-hand
buckets.
So
we
have
a
resolution
right
now
of
one.
That
means
every
bucket
is
going
to
get
one
value.
So
every
time
I
change
this
value,
it
gets
a
new
bucket.
Every
bucket
represents
one
value.
A
So
if
I
up
this
resolution,
I'm
going
to
pump
it
up
all
the
way
to
five
and
I'm
back
up
for
the
value
560,
for
example,
here's
what
we
see
go
to
561
no
change,
562,
there's
a
bit
difference
by
sixty
three
four,
five,
six,
all
the
same
bucket.
Well,
those
all
those
values
are
being
represented.
In
the
same
bucket
we
go
to
the
next
one.
A
We
finally
get
a
bit
changed,
so
these
buckets
have
five
values
in
them,
so
the
semantic
similarity
is
completely
the
same
between,
for
example,
556
or
566
and
562
in
this
example,
which
might
be
fine
for
your
potato.
Maybe
that's
a
small
enough
granularity
that
they
might
as
well
be
the
same.
It
depends,
but
if
we
go
to
a
much
different
value,
we
can
see
there's
an
entirely
different
representation.
In
this
case
we
just
went
from
366
to
410.
We
can
see
exactly
how
much
overlap.
A
We
have
1
2,
3,
4,
5,
6,
7,
8,
9,
10,
11
or
12
bits
of
overlap
so,
and
we
can
tweak
these
numbers
these
the
number,
the
width
of
the
bucket
the
size
of
the
distribution.
Generally,
we
want
a
very
high
end
and
a
W
that
gives
us
enough
on
bits,
but
also
enough
sparsity.
So
a
couple
of
things
about
the
RDS
see,
first
of
all,
it
requires
state
which
is,
which
is
a
little
interesting.
The
scalar
encoder
did
not
require
state,
it
could
encode
thing
without
any
state
at
all.
A
This
is
an
input,
value,
some
easy
algorithm
and
it
will
produce
the
output
for
the
RDS,
see
we
need
state
because
and
I'll
show.
You
here
look
at
watch
this
bucket
value.
It
only
says
it
has
two
buckets
right
now
and
that's
because
it's
only
seen
these
two
values
and
as
we
as
we
go
up,
it's
going
to
continue
adding
buckets.
It
grows
as
it
sees
more
data,
so
the
RDS
C
has
to
keep
track
of
the
different
buckets
it's
already
randomly
created.
A
So
it
knows
if
it
sees
a
new
value
where
it
needs
to
start
creating
more
buckets,
so
it
kind
of
grows
on
either
edge
as
and
watch
when
I
pick
a
value.
That's
over
here
off
to
the
side
and
you'll
see
that
it's
suddenly
now
it's
got
261
buckets
in
it,
because
I
just
jumped
all
the
way
up
to
about
260
something
value,
so
it
needed
to
run
through
all
of
those
different
buckets
creating
them
all.
So
we
could
get
to
this
representation,
so
the
RDS
see
kind
of
dynamically
expands
in
each
direction.
A
How
many
values
do
you
want
to
fit
into
a
bucket,
so
I've
been
using
integer
values
for
all
of
these
in
codings,
both
of
these
encoders
handle
decimal
values,
just
as
well
I'm,
just
not
showing
that
example,
for
example,
if
I,
if
I
take
the
resolution
below
1
or
between
1
and
0
you'll
you'll,
see
as
we
turn
this
comparison
on,
that
there's
more
than
one
bit.
That's
changing
between
buckets
because
there's
more
than
one
bucket
being
created
between
these
two
values,
because
it's
making
room
for
to
store
the
values
in
between
the
integers
right.
A
So
if
the
resolution
is
less
than
1,
it
we're
expecting
to
store
decimal
values.
So
so
we
could
go
through
a
very
low
resolution
here
to
store
those
types
of
much
smaller
numbers
and
still
get
the
same
resolution
and
semantic
similarity
and
encodings
that
we're
creating,
so
that
was
a
random
distributed,
scaler
coder.
So
these
are
not
the
only
methods
of
doing
number
encoding
for
HTM
systems.
These
are
two
of
the
most
commonly
used
in
our
current
systems,
but
there's
no
reason
you
can
create
your
own
scalar
encoder,
an
American
coder.
A
As
long
as
you
follow
these
four
principles
that
I'm
going
to
talk
about
in
the
next
episode
of
on
encoders,
let's
take
blood
pressure.
For
example,
blood
pressure
is
a
reading
that
consists
of
two
numbers:
the
systolic
and
diastolic
blood
pressure.
Now
these
two
numbers
sort
of
depend
on
each
other
on
how
they
should
be
interpreted.
A
You
can't
interpret
a
blood
pressure
with
just
one
of
these
numbers,
there's
a
semantic
meaning
for
the
pairing
of
these
two
numbers
like,
for
example,
the
range
between
the
systolic
and
diastolic
numbers,
and
your
blood
pressure
is
called
your
pulse
pressure.
If
your
pulse
pressure
is
too
low,
that
could
mean
that
you
have
congestive
heart
failure
or
that
you're
in
shock.
So
that's
a
potential
encoder,
a
blood
pressure
encoder
in
itself.
A
The
encoding
of
these
individual
values
do
not
contain
the
necessary
semantic
meaning
unless
they
are
paired
together
and
the
relationship
between
them
also
gets
encoded
as
well.
So
there's
lots
of
opportunities
here
for
the
community
to
create
new
and
interesting
encoders
that
take
that
real-world
data
and
encode
that
semantic
meaning
for
that
specific
domain.
In
the
next
episode,
we're
going
to
talk
about
the
date/time
encoder
and
how
encoders
that
are
encoding
different
semantic
meanings
of
different
parts
of
the
data
can
combine
them
all
into
one
encoding.