Numenta HTM Community, 3 May 2017

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Infrastructure 2017 - Alejandro Guirao - From 0 to anomaly detection in your metrics in 15 mins

Description

Using the NuPIC framework (https://github.com/numenta/nupic/tree/master/src/nupic), we will show the basics of Anomaly Detection using HTM ( Hierarchical Temporal Memory), and perform a demo trying to detect an anomaly in an infrastructure metric (such as a host CPU).

A

So Alejandra next speaker is DevOps, engineer, intelligence and you're going to talk to us about detecting anomalies, at least.

A

Thank you very much. Thanks for coming in this talk, I will try to awaken your curiosity and how, without doing a rocket science, a rocket science. It's like that. We can improve our anomaly detection technique, for example, anomaly: detection in our infrastructure matrix using some mathematical techniques.

A

The idea would be just to introduce the concept to awaken your curiosity, to explain a little bit of the theory and to end with a proof of concept, a small demonstration so that you can further learn and follow the link.

A

First of all, I will start with a quote of this incredible book: the art of monitoring by James turbo. If you haven't read it I recommend it and basically he says that using static threshold on alerting and monitoring is not a good idea. It's an idea from the past, but there are more subtle patterns that do not are not easily detected with the static threshold, and you have to search for other alternatives to detect those anomalies. For example, he proposes a tool which is called women.

A

It's an even ruder written by Cal Kingsbury, it's a fantastic tool that has many capabilities like aggregation and mathematical operations on data like percentiles the median. So it's a much smarter way to detect the anomalies, but I think that we can go further and we can complement it using some tools from the science field and that enables us to improve the occur and the accuracy and complain indeed. So, basically what I am going to present? It's not my idea. It's a a mathematical theory.

A

It's called hierarchical temporal model and basically it's a it's a biologically constrained theory of machine intelligence. That means that it has some resemblance with the biology of our cortex and the way that the way in which our neurons think and learn and learn new patterns and basically it all started in 2004, with publication of own intelligence by Jeff Hawkins, we have Hawkins, which was a computer scientist, neural neuroscience researcher and also the founder of pan with PD a company a later on, went on founding Newman tournament has been the main company behind the HTM.

A

Basically, they want to develop a series of time based learning, algorithms, that can store the patterns and be able to recall it because the power of it is not just detecting patterns, but also being able to predict the outcome being able to make anomaly detection.

A

Basically, I will go fast through three concepts that are important in the HTM. First, is the encoders the encoders take a sign all from the real world and translate it to a binary map of zeros and ones with some properties, then the special pullers take those maps of zeros and ones and convert it to another map of zeros and ones.

A

That has a lot of zeros and a small number of one, which is called a sparsely dense representation that has some mathematical properties regarding error, correction and robustness, and it's easier to learn patterns from there and finally, the temporal cooler that takes those sparsely dense representations and perform the learning mechanism so that it is able to learn the patterns and then it can make predictions. It can give you an estimate of the anomaly score and it can even correct the sign all in case that there is not it can reconstruct the say.

A

No I will go through some fast to some demonstration here. This is a typical, a color encoder, in which we have a quantity, in this case 41, and this should be a matrix of zeros and ones without zeros, and these are ones so that when we move the slider, it is encoded in a different way. As you may see, there's a lot of redundancy.

A

We could also manage it with just one active bit, but it is much less error-prone due to noisy conditions, and also you can see that there is some overlapping between one value and the other. This is important because it's it enabled that there's a semantic resemblance between values that are next to each other.

A

In the topological field, for example, this is a much complex encoding in which we are encoding the date we are, including separately, the day of the week Huli and if it's weekend, no not the time of the day and even the season, so that, if we change, for example, the day, we can see that this this is the entire encoding.

A

This part of the sign a lease which is moving, and if we change the hour it's another one, we can encode a much complex sign up, for example, using the timestamp and also using a value, for example, of energy consumption. Here we can see much complex pattern and we can make it into a one representation of this cells, and then we have our input and we perform, which is called the spatial Pooler. We take this input. We can show, for example, one of those inputs.

A

These are ones, and these are zeros and translate it to a much higher dimension, matrix binary, in which one of these cells, which is column, has some relationship will be input. Not every cell has a relationship between not every column in the output.

A

Has a relationship with every cell, but it has a numeric value that ranges from zero to a value that has that is higher than a threshold, so that there are connections, and in this case, for example, we can see that B cell a dis column is connected with this cell and this cell is active in the output. So this count as an overlap. The overlap is important, because overlapping means that this out column in the output- a really is representative of this input.

A

By, for example, for this input we can. We could have this output the how it is it. How is it calculated? Basically, we define the number of and bit that we are going to use, for example, let's say 22, and then we calculate the overlapping of each one of the columns here. We order them rank them and then just take the 22 first, and this is the output, it is not a static.

A

It means that if we enable learning, then for every one of the columns that has been selected, we tend to promote the input cells. That has been an overlap. So we increment dynamically. The this scalar value, the permanence that allows connection to be made, and otherwise, when there's a there are connections that has not been stimulated. We decrement it so that maybe those connections are lost in the next iteration, basically to see it working. This is an input space of office.

A

I know that has been encoding with the time stamp, and this is random special pooling without learning, and these are the the output, the column that we have and the learning one tends to make patterns spatial patterns that are easily learn about here. We can see the difference between the two of them now it are quite similar because it is starting to learn and basically for each moment in this red line, we can see the likelihood of the previews of salvations.

A

The green ones are the most similar and the red ones are the least similar ones. If we left it to all of the data set, we can then learn something some things that are interesting. For example, the patterns of the weekends that are smaller ones are different from the other ones and encoding. For example, if we encoded the weekend boolean in the output space, we can learn a recognize that they are the same pattern and we could match it easier.

A

Finally, the piece that is left is a temporal cooler which takes this sparsely dense representations. We have seen and performs. We are learning between them. I have told you that the results of the spotlight dance, which presentation alcohol, are cold columns. This is because they are not comprised but of one neuron but of a column of them, and each neuron represents a moment in time so that it can encode a several transition between the SVRs.

A

The idea is that column can can be activated well, a cell in the column can be activated not only by the inputs that we have seen in the input channel, but also with correlation of the previously activated neurons that are related to them, and this learning also tries to promote a the cotton temporal correlation between neurons and to penalize the ones that are not currently so that the cells can get in predictive state. Saying well.

A

I have seen this pattern in the past, and now it usually means that I'm going to be activated and it is put in predictive mode and if it's already activated, then it learns from that pattern. The idea, also with the hierarchical part of the theory, is that you can stack it and use the input, the output of one level to make decisions and learn patterns, a more subtle or complex patterns at higher levels in business. In theory.

A

In practice, you tend to make things simpler, and this is a part of the of the theory that has not been thoroughly developed. How do we use it? Ok, that's Matt! That is Theory. Well, there are some implementations.

A

Numenta has made an open-source, a framework which is called nupoc the for platform for intelligence computing that you have bindings in C++, Python, Java and closure, for example, module you could put it in your configuration of women and then leverage your streams with this kind of matrix, for example, the anomaly score: there's a desktop application for anomaly detection and there's a software-as-a-service appliance that is not open source that is called rock. That takes your AWS credentials and inspect your cloud watch metrics and try to derive anomalies, but for the proof of concept.

A

Okay, me some here.

A

I have used basically the nupoc binding of in Python, and what I am doing is just calculating the percentage of beautiful memory usage at each moment, and you can see that basically I import one modal Factory from the framework and just calculate each each time. The metric and I run it through the model so that it learns and basically I can get the prediction.

A

The next prediction and the anomalous color the rest is boilerplate in order to being able to plot it here in bouquet, so I have started it at the beginning of the presentation so that I give same time here for learning- and you can see here. This is the history, the blue line and the prediction. The prediction is not very good at this point: it takes quite a lot of time to stabilize and now we are seeing something like a tooth so pattern.

A

So it's not easy to predict, but somehow we can see that the anomaly score is quite low. It doesn't represent any any disturbance. However I can try to stress it. Oh there's been an anomaly here now. Here's clearly an anomaly, but we can see that a if it continues it tries to learn the new pattern and as soon as it learns it or something similar, then the anomaly score goes down and on the other way, when we finish it, it detects another anomaly and comebacks here. This is a the idea of this.

A

This is very row, it's a proof of concept, but the idea is that you can use it and it's not difficult. The downside is that you have to you, have to create a model with some parameters that are sometimes not easy. Some are related strictly related to the algorithm and the other ones related for examples on how you encode the memory there's a there's, a tool which is called the swarming that tries to find out some values for the parameters according to some data set. That may be a good fit.

A

Basically, if you want to learn more, there are some videos very cool. There is the people from momenta white papers, but also make good videos, and there is a code for the demo in the repo and I think. That's all I expect you have liked.