►
From YouTube: Semantic Anomaly Detection - Cortical.IO
Description
2015 HTM Challenge Application submission (ineligible for prizes)
A
B
B
Content
of
the
candidates
tweets
so
first
some
quick
background
information
about
what
we
do
at
cortical
and
how
this
relates
to
the
HTM.
Basically,
we
provide
an
API
that
encodes
text
into
an
AmeriCorps
presentation,
and
this
encoding
process
is
similar
to
the
way
information
is
distributed
throughout
different
areas
of
the
brain.
So
on
the
left
here,
you
can
see
a
graphical
representation
of
how
we
store
semantic
information
in
a
128
by
128
matrix
with.
B
So
once
you
have
one
of
these
semantic
fingerprint
representations
of
a
piece
of
text,
there
are
a
lot
of
things
you
can
do
with
it,
and
one
of
the
cool
things
about
our
STRs
is
that
they're
compatible
with
the
HTM
and
can
be
fed
directly
into
the
temporal
port.
So,
in
a
way
our
API
acts
as
a
text
encoder
and
spatial
Pooler
in
one,
and
once
you
start
creating
STRs
for
text,
you
can
use
them
to
let
the
HTM
learn
patterns
in
human
language,
too
anomalies
and
also
make
predictions.
B
So
that's
what
we
did
with
this
application.
First,
we
extracted
the
text
from
the
Twitter
feeds
of
six
presidential
candidates
shown
here
in
no
particular
order,
and
then
we
cooked
two
tweets
per
candidate
by
day
and
create
a
semantic
fingerprint
for
that
group
of
tweets.
Then
we
input
those
fingerprints
into
the
HTM
and
graph.
The
anomaly
scores
that
the
outputs
by
day
and
because
we
use
semantic
fingerprint
as
input
for
the
HTM
we're
not
grabbing
the
anomaly
scores
based
on
the
volume
of
tweets,
but
by
the
actual
semantic
content
at
them.
B
So
what
the
candidates
are
actually
talking
about,
so
the
higher
the
anomaly
score,
the
more
unexpected
the
content
of
the
Twitter
post
was
for
that
day.
So
when
you
see
peaks
in
the
graph
like
a
year
or
year,
the
HTM
determined
that
whatever
the
candidates
opposed
to
got
in
those
days
was
unusual
for
that
candidate
and
for
reference.
We
also
plotted
a
few
real-world
events
on
the
graph
as
vertical
red
lines.
B
So
you
can
see
how
detected
anomalies
correspond
with
events
like
the
eight
candidates,
making
official
announcements,
holding
campaign,
rallies
and
taking
part
in
debates,
and
so
the
graphs
are
interactive
and
you
can
move
your
mouse
over
data
points
to
see
the
keywords
and
exact
anomaly
scores
for
those
days,
and
then
you
can
also
click
on
your
data
points
to
see
the
full
text
of
the
tweets
for
that
day,
and
so
for
most
of
the
candidates.
You
can
see
that
at
the
beginning
of
the
graphs,
the
anomaly
scores
were
initially
quite
high.
B
But,
for
example,
you
can
see
with
a
Hillary
Clinton
the
top
graph
as
soon
as
she
officially
announced
her
candidacy
in
mid-april.
The
each
game
immediately
detected
a
change
in
which
was
posting
about
on
her
Twitter
account,
and
then
it
quickly
adjusted
to
this
new
pattern
of
topics
in
her
feet,
with
only
minor
anomalies
popping
up
after
that
I
like
this
one
year.
That
seems
to
correspond
with
a
rally
that
she
held
in
Labor
Day.
B
This
is
done
by
working
on
the
fingerprint
level
of
the
tweets
to
determine
what
the
candidates
are
talking
about
and
not
just
simple
keyword
matching.
So
when
you
click
these
buttons,
it
reduces
the
Twitter
feeds
to
only
posts,
they've
eye
similarity
to
these
topics,
and
then
we
train
separate
HTML
for
each
candidate
on
these
feeds.
So
the
anomaly
scores
are
based
only
on
the
filter
data,
so
you
can
see
that
certain
candidates
tend
to
post
more
about
certain
issues
and
for
some
candidates
it's
actually
an
anomaly
when
they
do
talk
about
certain
issues.
B
And
the
entire
application
is
available
live
right
now
at
this
URL
I
will
put
the
link
in
the
video
description,
and
so
we
encourage
people
to
take
a
look
at
it,
draw
their
own
conclusions
about
the
anomalies
detected
and
hopefully
use
it
as
a
way
to
get
a
clearer
picture
about
how
politicians
speak
in
the
media.
So
that's
it
semantic
anomaly.
Detection
with
a
cortical
I'll
write
an
API
and
the
HTM,
and
we
have
cortical,
are
big
fans
of
the
HTM
and
we're
very
much
inspired
by
the
work
that
Numenta
does.
E
A
D
I
hope
I
might
be
jumping
in
here.
The
what's:
what's
not
clear
to
me
is
how
much
the
the
temple
memory
is
actually
adding
here
versus
just
using
the
fingerprints,
because
you
know
if
a
candidate
is
very
consistent
every
day
and
they
putting
out
the
same
basic
topics
and
then
there
wouldn't
be
any
pattern
there.
It
would
be
kind
of
flatlining,
it's
just
you
know
the
Same
Same
Same
and
then
you
would
be
able
to
detect
a
change
and
then
the
HTM
temple
member
would
see
that.
D
B
I
think
that's
true
and
we
could
try
to
just
pick
out
what
topics
are
happening
when
and
then
see.
Okay,
is
this
a
new
topic
that
hadn't
happened
before,
but
I
think
yeah
I,
don't
know
I
think
by
having
the
anomaly
scored,
then
you
really
see
exactly
you
know
what
how
how
predictable
was
this?
How
different
was
this,
but.
D
D
If
you're
going
to
be
doing
a
sequence
of
day
to
day
is
sort
of
compilations,
you
you're
gonna
need
to
see
a
flow
or
change
day
to
day
right,
and
you
know
there
might
be
some
like
is
current
events
occur
or
approach
to
an
election
or
I,
don't
know
what
it
is
I
just.
It
wasn't
clear
to
me
that
the
temporal
memory
is
gonna.
Add
a
lot
over
just
doing
a
distance
overlap,
score
I,
don't
know
I.
G
Think
it
actually,
it
actually
contributes
a
lot,
because
the
if
you
would
do
that
in
a
static
fashion,
the
whole
detection
of
something
new
would
be
directly
dependent
on
what
the
person
is
talking
about
overall
yeah
and
because
the
only
way
of
doing
this
without
the
temporal
memory
would
basically
be
to
spot
for
specific
features
or
pixels
in
the
fingerprint
to
appear.
But
it
would
not
tell
you
if,
for
those
pixels
to
appear
at
this
very
moment
in
time,
this
would
be
something
new.
So
you.
G
C
B
A
G
It's
basically,
it's
somehow
smarter
in
figuring
out
how
they
are
related.
So
if
you
just
take
the
bag
of
words,
the
only
thing
you
can
do
is
to
match
the
key
words,
and
if
someone
expresses
a
certain
concept
by
using
other
key
words,
there
is
no
way
of
directly
figuring
out
that
this
is
actually
similar
and
by
using
the
the
arrangement
in
the
fingerprint.
G
G
Yeah,
so
it's
a
yeah,
exactly
it's
a
language
model
I
would
call
it
it's
a
semantic
model,
even
because
there
are
certain
aspects
of
language
that
we
don't
consider
as,
for
example,
the
actual
the
actual
sequence
of
the
words.
But
as
far
as
the
aboutness
of
the
whole
thing
is
concerned,
that's
what
is
basically
modeled.
E
C
G
G
G
So
with
the
tweet
tweets,
it's
basically
the
problem
to
actually
find
some
something
meaningful
I
mean
an
improvement
would
be
to
actually
train
the
system
purely
on
tweets,
which
would
then
allow
you
to
also
take
in
consideration
all
the
Smiley's
and
all
these
shortcuts
that
they
use
and,
as
I
said
already
in
a
couple
of
conversations
I
could
I
could
imagine
that
by
taking
all
the
smile
is
into
account
to
get
a
better
sentiment.
Analysis
on
the
tweets
than
we
see
in
current
systems,
who
try
to
do
this
by
by
dictionary?
Basically
I
have.
I
B
That
part
was
done
by
some
of
my
colleagues,
but
they
did
a
very
similar
hack
at
the
last
hackathon
at
the
breaking
news,
Tim
OH,
and
we
started
basically
with
the
parameters
that
we
had
for
that.
So
we
didn't
do
a
whole
lot
of
tuning
this
time
around,
because
I
guess
the
last
time
they
already
did
a
lot
of
it.