►
From YouTube: Infrastructure 2017 - Alejandro Guirao - From 0 to anomaly detection in your metrics in 15 mins
Description
Using the NuPIC framework (https://github.com/numenta/nupic/tree/master/src/nupic), we will show the basics of Anomaly Detection using HTM ( Hierarchical Temporal Memory), and perform a demo trying to detect an anomaly in an infrastructure metric (such as a host CPU).
A
Thank
you
very
much.
Thanks
for
coming
in
this
talk,
I
will
try
to
awaken
your
curiosity
and
how,
without
doing
a
rocket
science,
a
rocket
science.
It's
like
that.
We
can
improve
our
anomaly
detection
technique,
for
example,
anomaly:
detection
in
our
infrastructure
matrix
using
some
mathematical
techniques.
A
First
of
all,
I
will
start
with
a
quote
of
this
incredible
book:
the
art
of
monitoring
by
James
turbo.
If
you
haven't
read
it
I
recommend
it
and
basically
he
says
that
using
static
threshold
on
alerting
and
monitoring
is
not
a
good
idea.
It's
an
idea
from
the
past,
but
there
are
more
subtle
patterns
that
do
not
are
not
easily
detected
with
the
static
threshold,
and
you
have
to
search
for
other
alternatives
to
detect
those
anomalies.
For
example,
he
proposes
a
tool
which
is
called
women.
A
It's
an
even
ruder
written
by
Cal
Kingsbury,
it's
a
fantastic
tool
that
has
many
capabilities
like
aggregation
and
mathematical
operations
on
data
like
percentiles
the
median.
So
it's
a
much
smarter
way
to
detect
the
anomalies,
but
I
think
that
we
can
go
further
and
we
can
complement
it
using
some
tools
from
the
science
field
and
that
enables
us
to
improve
the
occur
and
the
accuracy
and
complain
indeed.
So,
basically
what
I
am
going
to
present?
It's
not
my
idea.
It's
a
a
mathematical
theory.
A
It's
called
hierarchical
temporal
model
and
basically
it's
a
it's
a
biologically
constrained
theory
of
machine
intelligence.
That
means
that
it
has
some
resemblance
with
the
biology
of
our
cortex
and
the
way
that
the
way
in
which
our
neurons
think
and
learn
and
learn
new
patterns
and
basically
it
all
started
in
2004,
with
publication
of
own
intelligence
by
Jeff
Hawkins,
we
have
Hawkins,
which
was
a
computer
scientist,
neural
neuroscience
researcher
and
also
the
founder
of
pan
with
PD
a
company
a
later
on,
went
on
founding
Newman
tournament
has
been
the
main
company
behind
the
HTM.
A
Basically,
I
will
go
fast
through
three
concepts
that
are
important
in
the
HTM.
First,
is
the
encoders
the
encoders
take
a
sign
all
from
the
real
world
and
translate
it
to
a
binary
map
of
zeros
and
ones
with
some
properties,
then
the
special
pullers
take
those
maps
of
zeros
and
ones
and
convert
it
to
another
map
of
zeros
and
ones.
A
That
has
a
lot
of
zeros
and
a
small
number
of
one,
which
is
called
a
sparsely
dense
representation
that
has
some
mathematical
properties
regarding
error,
correction
and
robustness,
and
it's
easier
to
learn
patterns
from
there
and
finally,
the
temporal
cooler
that
takes
those
sparsely
dense
representations
and
perform
the
learning
mechanism
so
that
it
is
able
to
learn
the
patterns
and
then
it
can
make
predictions.
It
can
give
you
an
estimate
of
the
anomaly
score
and
it
can
even
correct
the
sign
all
in
case
that
there
is
not
it
can
reconstruct
the
say.
A
No
I
will
go
through
some
fast
to
some
demonstration
here.
This
is
a
typical,
a
color
encoder,
in
which
we
have
a
quantity,
in
this
case
41,
and
this
should
be
a
matrix
of
zeros
and
ones
without
zeros,
and
these
are
ones
so
that
when
we
move
the
slider,
it
is
encoded
in
a
different
way.
As
you
may
see,
there's
a
lot
of
redundancy.
A
We
could
also
manage
it
with
just
one
active
bit,
but
it
is
much
less
error-prone
due
to
noisy
conditions,
and
also
you
can
see
that
there
is
some
overlapping
between
one
value
and
the
other.
This
is
important
because
it's
it
enabled
that
there's
a
semantic
resemblance
between
values
that
are
next
to
each
other.
A
This
part
of
the
sign
a
lease
which
is
moving,
and
if
we
change
the
hour
it's
another
one,
we
can
encode
a
much
complex
sign
up,
for
example,
using
the
timestamp
and
also
using
a
value,
for
example,
of
energy
consumption.
Here
we
can
see
much
complex
pattern
and
we
can
make
it
into
a
one
representation
of
this
cells,
and
then
we
have
our
input
and
we
perform,
which
is
called
the
spatial
Pooler.
We
take
this
input.
We
can
show,
for
example,
one
of
those
inputs.
A
A
Has
a
relationship
with
every
cell,
but
it
has
a
numeric
value
that
ranges
from
zero
to
a
value
that
has
that
is
higher
than
a
threshold,
so
that
there
are
connections,
and
in
this
case,
for
example,
we
can
see
that
B
cell
a
dis
column
is
connected
with
this
cell
and
this
cell
is
active
in
the
output.
So
this
count
as
an
overlap.
The
overlap
is
important,
because
overlapping
means
that
this
out
column
in
the
output-
a
really
is
representative
of
this
input.
A
By,
for
example,
for
this
input
we
can.
We
could
have
this
output
the
how
it
is
it.
How
is
it
calculated?
Basically,
we
define
the
number
of
and
bit
that
we
are
going
to
use,
for
example,
let's
say
22,
and
then
we
calculate
the
overlapping
of
each
one
of
the
columns
here.
We
order
them
rank
them
and
then
just
take
the
22
first,
and
this
is
the
output,
it
is
not
a
static.
A
It
means
that
if
we
enable
learning,
then
for
every
one
of
the
columns
that
has
been
selected,
we
tend
to
promote
the
input
cells.
That
has
been
an
overlap.
So
we
increment
dynamically.
The
this
scalar
value,
the
permanence
that
allows
connection
to
be
made,
and
otherwise,
when
there's
a
there
are
connections
that
has
not
been
stimulated.
We
decrement
it
so
that
maybe
those
connections
are
lost
in
the
next
iteration,
basically
to
see
it
working.
This
is
an
input
space
of
office.
A
I
know
that
has
been
encoding
with
the
time
stamp,
and
this
is
random
special
pooling
without
learning,
and
these
are
the
the
output,
the
column
that
we
have
and
the
learning
one
tends
to
make
patterns
spatial
patterns
that
are
easily
learn
about
here.
We
can
see
the
difference
between
the
two
of
them
now
it
are
quite
similar
because
it
is
starting
to
learn
and
basically
for
each
moment
in
this
red
line,
we
can
see
the
likelihood
of
the
previews
of
salvations.
A
The
green
ones
are
the
most
similar
and
the
red
ones
are
the
least
similar
ones.
If
we
left
it
to
all
of
the
data
set,
we
can
then
learn
something
some
things
that
are
interesting.
For
example,
the
patterns
of
the
weekends
that
are
smaller
ones
are
different
from
the
other
ones
and
encoding.
For
example,
if
we
encoded
the
weekend
boolean
in
the
output
space,
we
can
learn
a
recognize
that
they
are
the
same
pattern
and
we
could
match
it
easier.
A
Finally,
the
piece
that
is
left
is
a
temporal
cooler
which
takes
this
sparsely
dense
representations.
We
have
seen
and
performs.
We
are
learning
between
them.
I
have
told
you
that
the
results
of
the
spotlight
dance,
which
presentation
alcohol,
are
cold
columns.
This
is
because
they
are
not
comprised
but
of
one
neuron
but
of
a
column
of
them,
and
each
neuron
represents
a
moment
in
time
so
that
it
can
encode
a
several
transition
between
the
SVRs.
A
The
idea
is
that
column
can
can
be
activated
well,
a
cell
in
the
column
can
be
activated
not
only
by
the
inputs
that
we
have
seen
in
the
input
channel,
but
also
with
correlation
of
the
previously
activated
neurons
that
are
related
to
them,
and
this
learning
also
tries
to
promote
a
the
cotton
temporal
correlation
between
neurons
and
to
penalize
the
ones
that
are
not
currently
so
that
the
cells
can
get
in
predictive
state.
Saying
well.
A
I
have
seen
this
pattern
in
the
past,
and
now
it
usually
means
that
I'm
going
to
be
activated
and
it
is
put
in
predictive
mode
and
if
it's
already
activated,
then
it
learns
from
that
pattern.
The
idea,
also
with
the
hierarchical
part
of
the
theory,
is
that
you
can
stack
it
and
use
the
input,
the
output
of
one
level
to
make
decisions
and
learn
patterns,
a
more
subtle
or
complex
patterns
at
higher
levels
in
business.
In
theory.
A
A
Numenta
has
made
an
open-source,
a
framework
which
is
called
nupoc
the
for
platform
for
intelligence
computing
that
you
have
bindings
in
C++,
Python,
Java
and
closure,
for
example,
module
you
could
put
it
in
your
configuration
of
women
and
then
leverage
your
streams
with
this
kind
of
matrix,
for
example,
the
anomaly
score:
there's
a
desktop
application
for
anomaly
detection
and
there's
a
software-as-a-service
appliance
that
is
not
open
source
that
is
called
rock.
That
takes
your
AWS
credentials
and
inspect
your
cloud
watch
metrics
and
try
to
derive
anomalies,
but
for
the
proof
of
concept.
A
I
have
used
basically
the
nupoc
binding
of
in
Python,
and
what
I
am
doing
is
just
calculating
the
percentage
of
beautiful
memory
usage
at
each
moment,
and
you
can
see
that
basically
I
import
one
modal
Factory
from
the
framework
and
just
calculate
each
each
time.
The
metric
and
I
run
it
through
the
model
so
that
it
learns
and
basically
I
can
get
the
prediction.
A
The
next
prediction
and
the
anomalous
color
the
rest
is
boilerplate
in
order
to
being
able
to
plot
it
here
in
bouquet,
so
I
have
started
it
at
the
beginning
of
the
presentation
so
that
I
give
same
time
here
for
learning-
and
you
can
see
here.
This
is
the
history,
the
blue
line
and
the
prediction.
The
prediction
is
not
very
good
at
this
point:
it
takes
quite
a
lot
of
time
to
stabilize
and
now
we
are
seeing
something
like
a
tooth
so
pattern.
A
So
it's
not
easy
to
predict,
but
somehow
we
can
see
that
the
anomaly
score
is
quite
low.
It
doesn't
represent
any
any
disturbance.
However
I
can
try
to
stress
it.
Oh
there's
been
an
anomaly
here
now.
Here's
clearly
an
anomaly,
but
we
can
see
that
a
if
it
continues
it
tries
to
learn
the
new
pattern
and
as
soon
as
it
learns
it
or
something
similar,
then
the
anomaly
score
goes
down
and
on
the
other
way,
when
we
finish
it,
it
detects
another
anomaly
and
comebacks
here.
This
is
a
the
idea
of
this.
A
This
is
very
row,
it's
a
proof
of
concept,
but
the
idea
is
that
you
can
use
it
and
it's
not
difficult.
The
downside
is
that
you
have
to
you,
have
to
create
a
model
with
some
parameters
that
are
sometimes
not
easy.
Some
are
related
strictly
related
to
the
algorithm
and
the
other
ones
related
for
examples
on
how
you
encode
the
memory
there's
a
there's,
a
tool
which
is
called
the
swarming
that
tries
to
find
out
some
values
for
the
parameters
according
to
some
data
set.
That
may
be
a
good
fit.