Numenta HTM Hackers' Hangouts, 1 Sep 2017

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: HTM Hackers' Hangout (Sept 1, 2017) [Focus on Anomaly Detection]

Description

Today we'll be taking questions about anomaly detection and HTM. Specifically: https://discourse.numenta.org/t/live-q-a-session-on-anomaly-detection/2674

A

Hey everybody welcome to HTM hackers hanging out. It is Friday September, 1st 3 p.m. on the west coast of the United States, and today we're gonna be talking about anomaly detection with with Vic specifically, but also in general, with HTM systems. In the Numenta offices, we have suit I Ahmad and Scott Purdy, who are research engineers at Numenta and I'm gonna. Let sue batai run a good portion of this presentation or this talk because uh he's got a lot of good resources about anomaly detection.

A

So why don't I just hand it off to you suta if you're, ready, amen and.

B

Nupic yeah, so I thought so Matt's and Scott party is here as well. He knows a lot about this stuff and what we thought we'd do. The fear of us talk is there's a couple of sort of generic questions and general questions that seem to come up quite a bit. So I thought we'd walk through some of that first and then maybe go to some specific questions and I know. Jacob has listed a bunch of them that we could try to go through them and maybe others will have questions as well.

B

So what I we thought we'd start with is just going kind of a high-level overview of anomaly detection and what are the best practices and how to think about htm' and new pic with anomaly detection and so I'm going to let's see switch to the T. Can you see my screen now Matt.

A

B

Okay, all right, so in the beginning, what I'll do is walk through a couple of things that are in a paper that we actually published so Scott myself, Alex, Levin and zu ha uh I was an intern here we published a paper.

B

It just came out a few months ago on anomaly detection, and this pretty much has from an academic standpoint, everything that we kind of know about a chance and anomaly detection and streaming, and oddly detection, in particular, all wrapped up into one paper that includes the includes NAB, which is that Numenta anomaly benchmark that we put together so.

C

I thought I walked through a couple of.

B

Things in here first, so you know one question that maybe we should start with is just you know what is anomaly detection and what is streaming anomaly, detection, which is the thing that we we tend to focus in so there's kind of an image here than the paper. That shows an example and a stream of data over time- and this is a typical kind of sensor stream that you might see in many industrial applications like Internet of Things and so on.

B

This particular thing is like temperature from a very large industrial machine, and you can see things kind of the the metric evolving over time over many months in this case, and the black dots show three anomalies.

B

So what we define anomalies, as is something unusual, some unusual pattern in the data given what you've seen so far in the past and the three black dots here are three anomalies that were actually marked by a human expert and the engineer who works on this machine, and it highlights you know some easy examples of anomalies and a really tricky one. So you can see this dot on the left. Here is a very easy example of an anomaly. The reading suddenly dropped, went to zero and then went back up again. So that's clearly an anomaly.

B

It's unusual in this data and if you look at their very last anomaly, on the right hand, side this is a case where the the reading dropped. It didn't, go all the way to zero, but this was actually turned out a catastrophic failure in the machine. There's the machine basically just broke down and the temperature didn't go all the way back down to zero, but it was significantly lower than the operating normal operating temperature.

B

And, what's interesting is this middle anomaly, which is not so obvious, and most traditionalist techniques actually would not pick up on this. But this anomaly acts preceded the catastrophic failure. So if you look closely, you can see, there's some unusual fluctuations in the data and that you know you kind of don't see before I mean you have to be a bit of an expert to really notice this. But there are unusual fluctuations here and this kind of highlights what we tend to focus on, which is temporal anomalies.

B

So it's not just that the value at a given point in time is unusual is just that the temporal pattern of the data is highly unusual and if you can pick up on temporal patterns, you can pick up on much more subtle anomalies than if you just look at the instantaneous value, and this anomaly isn't a good example of that, because there's no there's no one reading here, that's unusual at all. It's really the temple pattern of reading sets that are unusual so with HTM on these kinds of anomalies, at least in this.

B

In this paper and most of the work we've done so far.

B

Where look we're focusing on something we call streaming analytics or streaming anomaly detection and the paper has a definition of what we mean and basically it's just a continuous stream of readings that is evolving in time and you there's no training set or test set you're just constantly getting data, and you just have to tell at every point in time. Is it unusual or not given?

B

What's happened in the past, so this means the system has to be continually learning, it should be unsupervised and one of the things that's quite interesting as as sort of noted in this chart here. Is you really want to find anomalies as early as possible, it's great to detect this last anomaly on the right, but this is actually after the machine has already broken down. What you ideally want to do is detect the second anomaly here, which is well before the system actually breaks down so detecting anomalies early is a really big deal? Okay.

B

So how would you do anomalies with HTM that what we do and I'll walk you through some code samples as well? Is we get if you look at this figure a we get, a stream of data coming in so X of T the data, the current point in time we pass it through an HTM system.

B

We compute something. We call the prediction error from that and sometimes you'll hear the word anomaly score. It's the same thing here: we're calling it prediction error in the paper and then from that we compute something.

B

We call the anomaly likelihood which is, and the anomaly likelihood is what you really should look at to determine whether the system is anomalous or not, and this is a probabilistic measure, it's kind of the probability that the system is in a normal state or one minus the probability that it's in a in a your signal anomaly, and if you look inside the HTM block, if you're familiar with new pick you'll see some of our familiar components. We have encoders, we have the spatial Pooler and we have the sequence memory or the temporal memory.

B

Those are all inside this block here. So this is sort of how we you know use do anomaly: detection with HTM, so the stream of data goes into HTM will model the sequences of the temporal characteristics, and these two other measures here will try to determine given what the HTM is predicting and modeling right now. What is the chance that there actually is an anomaly? Ok, so let me switch to some code I think.

B

So if you actually want to write code to do anomaly, detection I think the best place to start is looking at the sample code that we put together for our benchmark application, which is called nab, momenta anomaly benchmarks it'll be switched to that.

B

Probably most of you have seen it, but if you go to github.com, slash Numenta, slash nab. This is what we call the new mint anomaly benchmark and it's got results from lots of different algorithms, running anomaly, detection and I'm, going to focus in on new mint HTM here, which does the best in this data set. There's a ton of data in here under the data folder that you can take a look at, and our paper describes all of this in detail, but I'm going to focus in on some of the code.

B

If you go into the nab directory here, there's code for of different detectors and there's code for the momenta HTM detector. So if you click on momenta detector up high, it's got pretty short. This is a pretty small file. This shows you how to use new pic to do anomaly. Detection using all of the kind of techniques and stuff that we know how to and some of the best practice for for doing it. So let me walk through that code here, so I'm going to switch to my editor.

B

Hopefully there's not too big, you guys can see. This is the exact same file, but I'll walk through it a line-by-line.

B

Let me jump to the initialize method, which is this short method here. That shows you how to create the HTML.

B

So in here we assume that you know basically the range of values that your data is going to be it. So that's input max and input min. So if you're doing percentages you'll be between 0 and 100, if you're doing temperature- or you might be- you know- between whatever ranges- your your system, it's temperature- is that, but we assume you know what that is, and then new book has this convenient method where, if you give it the min and the max, it will give you back the best HTM model parameters for anomaly detection.

B

This assumes you have time stamp associated with your system, so you know time stamp and a value, but I would highly recommend. Starting with this. It actually took us a quite a long time to figure out the best set of model parameters that works well for anomaly detection and we've tested this on literally hundreds of maybe thousands of data files, and this is them the set of parameters that works best across.

B

You know the vast majority of data files that we've tried- and this is also the best set of parameters that works well with them, so I strongly recommend, starting with this set of parameters, even if what you're doing is something slightly different from the way we're doing it. If you start with this and then not five, you have much more likely to get good results than if you start from scratch.

B

Let's see, then there's a method here to set up the encoders. It basically says how to map the names of the fields that are in your data file, so that's fairly straightforward and then this line will actually create your HTM model. So this is what goes into that HTM block I showed earlier, and this bit of code sets up the anomaly likelihood class.

A

This confuses people that they have to enable inference for anomaly models.

B

Okay, well, you have to enable inference from huh well. This is is basically it because you have multiple fields in here. You have the timestamp fields and the value field you have to tell the system which field is the key, is the actual value field and I'm, not hundred percent sure? Why we need this either, because the anomaly score doesn't really use this. So it's a valid point of confusion, but for whatever you think it's.

A

Something hard-coded in okay, I. Think, okay,.

B

Okay, so that's.

D

Just our bad, it's part of it's that the OPF is explicitly for prediction problems, yeah.

B

Yeah I guess we're repurposing a prediction model for this, but yeah in in in in theory, this is not really needed, but in right now, with our code, it is it's a valid valid confusion with the anomaly likely that a couple of parameters you need to pass in you typically you want to be. You want to have the system learn on a certain amount of data. Before you start trusting, it's anomaly, you know likelihood outputs. So this kind of this parameter kind of tells you what that period should be, and usually.

E

B

Find something like a few hundred data points or a thousand data points is sufficient for obviously the longer the better, but in practice you know you can't have really long learning period, so I forget in nab, I think it's like 200 or 400, or something like that. So it's on that order.

B

C

I think the rest of it.

B

You you know again, you should follow. Basically follow this code template to set up you an ami like the good and then to run the model. When you get input data coming in you, you send it to the model and you get a result. So this is the output of the HTM block.

B

You get the anomaly score, which in our paper we call the prediction. Score prediction: error: prediction error! Sorry, thank you!

B

I'm gonna skip this for a second I'll, get back to that in a little bit, and then you take that anomaly score or prediction error and you pass it to the anomaly likelihood class and you get back an anomaly probability, but you confusingly. We call anomaly story here as well, and then we, since this is a probability probabilistic measure and anomalies, are extremely unlikely.

B

So it's actually very convenient to work in a logarithmic domain. So we compute convert the probability into a log likelihood, which is this log score, and then we use that everywhere else in our in our system. Okay, so that's the basic flow we found in practice that there's another exception, that's very useful. To use- and do you want to explain this or Dominic when it's about.

D

B

D

D

Yeah, so basically the idea is that occasionally you have cases where the data is really noisy. So one one point by itself: no matter how far out the norm isn't enough to move the likelihood, because the likelihood uses some window, and that is not very intuitive and a lot of people would come to us and say hey. Why isn't this detecting this very obvious anomaly? It's a it's! A clear spatial anomaly, evalu way outside the normal range.

D

We've never seen anything like this before this should be five dozen Amelie, and so we put this this, and this basically just looks for values. Some amount outside of the range of data that's been seen so far and and we'll just sort of it's. It's not the most elegant way to address this, but it basically would just take anything outside it's five percent in this case of the range we've seen so far and say that that's not anomaly period, no matter what the likelihood comes out.

B

Yeah and in practice we found that this helps that this incrementally helps the anomaly: the overall performance of the system a little bit okay, anything else here, I should cover.

B

So it's actually pretty straightforward. You know, there's an initialization step and a convenient function to get the best model parameters and then what you have to do is run the model and then send the results through the anomaly likelihood and then use the log version of that to actually do the threshold them. One question we see in the forum's quite a bit. Is you know what value should I use to to actually then detect the anomaly?

B

Sometimes they say well, I. The only likelihood is giving me a value like 0.99 and I, don't see an anomaly and that is actually expected again. The anomaly likelihood is a very, very it's a probabilistic measure and 0.99 means is a there's a one in a hundred chance that it's an anomaly that it's an unusual data point and usually that's actually not sufficiently rare to call an anomaly and what we use is actually five minutes. So zero point, nine, nine, nine, nine, nine and it's a threshold. And so that's why.

C

You see why the log.

B

Score is helpful if you convert that to a logarithmic thing: we've scaled it so that zero point. Five is the threshold for an anomaly for the law of the logarithmic of the anomaly likelihood having a value of 0.5 or higher is the best threshold for detecting an anomaly, and that works really well.

B

Okay, so 0.99! If you just look at the anomaly likelihood by itself, is actually not high enough for an anomaly. You need two five, nine so and.

D

If you want to plot it, you definitely want to do the log yeah yeah. The log will make. As you said, five nines will become a point five. So anything point five to 1.0 would be considered anomaly if you use five nines and then four nines becomes 0.4, I believe yeah.

B

D

It's calcium from that yeah exactly so. If.

B

You're, plotting plot the log value, it's much more intuitive to use that. So so one one question, so you know there's the anomaly score and their nominee likelihood and some people often ask well you know why am I getting high or low anomaly scores. So I see that question quite a bit and my basic answer is: don't even look at the anomaly score. Just look at the anomaly likelihood. The anomaly score can spike up or it can be low. For you know for a short period of time it does not necessarily mean it's an anomaly.

B

There are some really noisy datasets and it's it's impossible to predict any single value and so instantaneously. The anomaly score might be higher, but that doesn't necessarily mean there's a true anomalies. We might.

C

Actually have examples.

B

Of that here, paper.

B

D

This everything's yeah here's a good example. Yes, a B there is the prediction: error, yeah, Rondon, miss work, so.

B

This is a real-world data said this is latency on a load balancer in in a web cluster, and you can see that the latency often spikes up pretty high. This is seconds here. Most of the time is pretty low, but you can see that this is completely random.

B

It's very difficult to predict at any point in time whether the latency is gonna be high or low, and here's an anomalous situation where suddenly the system was slow for some reason and you get many more low latency requests than you would otherwise or I guess: high latency requests, and so here you're looking at the prediction, error or the anomaly score, and you can see every once in a while it just spikes up high just because it didn't predict this particular value and that's totally fine, but it is when it's high for a sustained period of time.

B

That's when you should really create an detected anomaly, and here you can see that if you threshold on this is the log likelihood of anomaly. If you threshold on 0.5, you would detect this area as an ominous.

B

B

So low the basic things I wanted to cover one other thing, I wanted to point out is we have a sample app on our web site called HTM studio? You might actually want to start with this even before you start coding anything because this also contains it contains the same code. I showed earlier embedded inside a UI.

B

You can actually upload or use open up one of your data files and it will run through the whole process with you and show you a nice UI where the different anomalies are, and you can read through it here. It's free to download and I think the source code for this whole app is available as well, but this is this app underneath it does. This exact same thing, I showed earlier.

A

Ok, Matt anything. What else should I? Yes, some other teams of questions I think we already covered this, but I want to emphasize that you don't need to you shouldn't need to swarm to get anomaly model parameters like Zubat I said we already have a good set of them. However, I think some people are trying to do some other things like they have multiple scalar values that they're they're trying to do multiple fields with time stamps and addition to other scalar values.

A

And if, if someone wanted to try and and get better parameters for their specific situation, should they should they try and do a swarm? And if is there any way that might help anomaly detection, or should we just start from those parameters that you identified yeah.

B

That's a great question so swarming we have not found a good way to do swarming with anomaly detection for if you're just doing basic prediction, swarming works quite well, but for not only that so what swarming does is finds the best set of parameters that optimizes a particular metric. So prediction error: you can optimize that and that will give you a good prediction system, but we don't have a good metric for anomaly. Detection.

D

Specifically, you can't use the anomaly score or likelihood for this. It that won't work well. Yeah I actually think that swarming using if your problem is set up that you you can frame it as a prediction problem, and there is a variable that is indicative of of your problem, that you can make the predicted variable and swarm base and that I think that is a good way to approach it.

B

If so, you could be, you could do a normal prediction swarm and then use the parameters for that and then, instead of in that code, I showed you you can use substitute model params for under the swarm, but but you have to be careful because if you have a data, that's very very new care data, that's very noisy and inherently hard to predict. Then swarming is on that on prediction. Error is not going to give you a good result, I think so. The latency thing, for example, you.

D

Know it's true. You have to be calm, and the other thing to keep in mind is that the prediction case you're the way that works is its optimizing for a specific field. So if you have multiple fields, it's optimizing to predict one of them and it's gonna wait. It's decision based on the internal values that they actually help it, which might only correspond to some of the fields not all of them, and so your anomaly score is, is different from that.

D

Your naama score is based on the entire internal state and it doesn't know what parts are which feels. So, that's where you have to be a little careful where you might get a good predictive model. But if you look at the entire internal state, it might not actually be a good metric for for anomaly section for your application.

A

Something else that paper that you showed earlier suicide. Does it explain how the prediction error, as it's named it in the paper, how that is calculated from the internal cellular state of this Jim.

B

Yeah, it does do that. That's.

A

Another common question so I want to point out that that paper is where you can find out info yeah.

B

Yeah that basically has pretty much everything we know about anomaly detection. So here it's actually. The first equation looks like.

A

A

It does notice it's not a super calculated calculation either.

B

Know each of these is this is like the prediction, the set of cells that are that were predicted from the previous time step, and then this is the current set of active cells in the temporal memory, and this is just a normalizing thing also that it's the number it's the current set of active. It's the number of active columns. Actually, sorry, not the this is all in columns and.

D

That simple explanation of this is the the it's the fraction of active columns that were not predicted. Yeah.

B

Yeah, which is why you have the one yeah, and so this handles multiple predictions you're the system is predicting three different things. If any of those three occur, the prediction error will be zero.

A

Okay, so some other common questions that I think I think we've hit on most of these, we talked about how anomaly detection is related to prediction. We've talked about multiple input streams and how it affects anomaly. Detection I was just a little bit confused by what you just said, because you said, if you're, if you're, calculating different predictions that that could affect the the prediction error.

B

A

B

Yeah, so htm' can do multiple can predict multiple things into the future. That's one of the kind of nice things about the way that the system represents things using sparse vectors, but you know say you are flipping a coin. Heads and tails are both reasonable next steps, so this ATM will be predicting both heads and tails, but if something completely different happens that wasn't predicted. So that will be an anomaly. So if you're flipping a coin, there's no single prediction: that's going to be hundred percent accurate.

B

You have to be doing multiple predictions and so with the anomaly detection. If it's either a heads or a tails, it'll say: there's no anomaly, but if you have something completely different.

C

B

Call it, uh then the error will be high. I.

A

Think in in my head, I have to stop thinking about predictions as scalar values. You know it's. The predictions are like the predicted cells within within the system and all of them are predicting potentially different things in different ways. Yeah.

B

Yeah, exactly and you're always predicting lots and lots of things to happen. There's very rare.

C

That there's only one.

B

Possible next thing that can happen.

A

Okay, I think we hit the big questions that the next two things are. You know Jacobs got his list of things that we could go through or we could take questions from those who are currently joined in and the Hangout right now.

A

Let's do that. First, let's see if anybody on the Hangout wants to chat just unmute and and speak your piece. If you have a question about anomaly or a comment about anomaly detection and let us know.

D

Should we start on the ones we have while we're waiting to get more sure, so yeah.

A

D

On the stream ask questions now, while we're covering these and we'll get to them after I.

A

Pin your questions or you can wait and ask with your voice after okay.

B

Jacob posted a very detailed set of questions which were cool, so we can look at that. I'm.

A

Gonna enter the URL and chat and I'll put it in the live chat. Okay,.

B

um So he's not using.

A

Jacob disjoint, okay,.

B

Hey Jacob, so, okay, so you're not using an anomaly detection kind of a streaming, continuous learning scenario, but more in a you have a bunch of sequences and you want to see if a new sequence is an ominous or not correct. Okay, and then that should be fine.

B

The first question, though, is something different, so there's parts of the code that are no longer used, so the anomaly likelihood region and then um the auto classifier yeah they are not used in the nab example. Today the anomaly likelihood region, the anomaly likelihood is kind of a computed after the fact, but someone you know. Ultimately it could be a region, that's included in the network, and then you wouldn't have to do that extra calculation, but I think this is still in process and not complete.

D

Or is that I think it.

B

A

D

A

That it after yeah, what just happens.

D

For sure wasn't.

A

D

A

D

Yeah yeah, it could be and I think I think he was trying to add different types of computations here, which I think we want to just put in what what we've sort of proven works and then people are welcome to create their own regions to do whatever they want again. To use this, you would be working at the network API level as opposed to using the OPF model. The OPF model handles this in a different way.

D

It could use this, but it doesn't happen to be using it right now and then the anomaly classifier, or do you have more Sam? Oh yeah,.

B

They say that, eventually, if we use the region to do it, it would make the common code simpler, much less so they'll be nice to put in yeah the classifier.

F

D

So that we had when we, when we were building an application with this- and we ran into this problem- that one thing that's not that people sometimes don't like is that if you see something, that's an anomaly in your data and that corresponds to some bad thing that happened in the real world like if this is a sensor on a piece of machinery if the machinery broke and you and you detected an anomaly leading up to it, it might be useful to detect that anomaly in the future as well.

D

But the way the Nama tection works is, if you see that multiple times, eventually, it's gonna be learned as normal behavior, and so we wanted the ability for people to say hey this anomaly right here is something I want to know about in the future, even if it starts looking like a normal behavior, even though it's not really a true anomaly anymore, it's it's something I want to know about, and so we kind of it's kind of a way of using anomalies as a way to alert you of unusual behavior the first time, and then this is a way for you to keep track of when it happened to get in the future and.

C

D

This classifies the state internally that that caused the anomaly so that if we see that stick in the future, even though it won't look like an anomaly to the anomaly algorithm, it will we'll still be able to tell the user hey. This thing happened again: yeah.

E

B

One though, or do you not use it? No, we haven't used, it I think there's some challenges with it. So one big thing is that it's very rare that you're gonna see the exact same sequence again and one a class and want to classify that. So, if you think about you know this example that I was showing earlier. So this here might be a pattern that you want to classify again, but it's unlikely to look like this.

B

It might be quite different, and so you know how you classify a sequence is actually quite a tricky problem and I. Think in general it's an unsolved problem in machine learning. You know you want to classify maybe things that are quote-unquote similar to this pattern, but not exactly Mike doesn't have to be exactly the same. So you.

A

D

So, basically, if there was a some kind of machine failure that caused some sequences here, that we detected as an all in one time that same failure can cause similar with different sequences as well. So essentially, you'd have to classify the same thing multiple times in order.

C

D

A robust classifier for us yeah.

C

B

D

For some people, but in other cases, might be yeah CVS.

B

And in general we have not done too much. We've done a little bit of work, but not too much on sequence, classification with HTM, so I think within the HTM community. This is an unsolved problem. How do you take a sequence of patterns and even with multiple training sets be able to classify it robustly so that that's sort of a I would say it's a good research area, I think so we.

C

We typically have.

B

Not used this too much but you're welcome to try it and and improve it, because I think you know, we've done anomaly, detection we've done prediction: we have not really done sequence classification and there would be a really nice third leg to have in this. You know trio of capabilities, yeah.

E

It'll be needed, but I'll get to it eventually.

D

That would be great when your name down for it yeah you're on the hook.

A

Any of the work that that we've done like the Merrion did it would be helpful, I think she's added open-source yeah.

B

It could be there's there's stuff in the HTM research on it. I don't know how well commented it is itself, but yeah there's. Definitely some work level that we've done on it, but the basic conclusion is: it was a difficult problem.

B

And yeah you're right that this would disable it I think this only looks at values that are above this threshold and since the highest value you can get is 1.0. It would basically ignore everything. If you do this, can you do swarming for anomaly? Detection I, think we we answered that already.

B

So multi variable anomaly, detection! You don't use time of day, which is that's fine, you don't have to and it it seems like.

B

You have to add multi-channel data to give the contextual clues for anomaly detection yeah. So so, basically, if you have multiple fields coming in and you want to do anomaly detection, there are a couple of different options that we played around with.

B

One of them is simply to do a completely separate anomaly model for every single field and then, if any of them give an anomaly or two of them given anomaly, then you say it's it's an anomaly. So that's that's one possibility you could feed in all the data into a single spatial cooler and then do our normal anomaly. Detection I found that that works okay for a few fields, but the amount of training data you need to train the spatial Pooler and the temple memory will grow pretty fast.

B

As you add, in more and more fields, and so yeah.

E

B

E

A rule of thumb for the number of inputs and the amount of training required.

B

It's dependent on the underlying dimensionality of the data, so this this so it's hard to give a rule of thumb. The the if you, the space, increases exponentially, as you add in more and more fields, so it could be that you need that the amount of data grows exponentially, but usually real-world data will fall in some lower dimensional manifold in that space. So it's a function of that underlying dimensionality. So.

E

Theoretically, you could have like 20 inputs, but if all the data is correlated together and like a single dimension, yeah.

B

E

It would be tractable exactly.

B

Exactly and and what some people have done in the past and we've also done this- is you could just run principal components on the data trim it down to a smaller number of fields and then feed that to a spatial Pooler?

B

So if you have real world, but if you have scalar values, that's something you can do. Oh.

E

Yeah keep on yeah.

B

E

B

You ask for tips.

E

Well, one of the comments in that question was so you can, if you put input correlated data, it's not gonna increase the dimensionality of the of the problem right right.

F

E

Right so, but don't you want correlate idea for anomaly detection or do you want? You know orthogonal data? Yes,.

B

So with the HTM, what we're doing is we're making predictions into the future, so data- that's correlated at a particular point in time, is not going to help too much. You can correct me if I'm wrong, but the spatial polar shed did you, but anyway yeah it should did it so and but what you want is data that's correlated in time.

B

So if you have something at time, t minus 1, that's correlated with another variable at time, T or T, plus 1. That would be a useful field to put in so.

E

If you have X correlated variables coming in and you know, X I decides it wants to go its own way. Is that going to pop up? Or you say it you say it's not correlated I, don't know I'm trying to yeah I, understand yeah.

B

E

B

So if you have a bunch of variables that are correlated and then another one sort of goes off somewhere else, then what you'll get is you'll get a few bits in the spatial cooler that will be quite different from what it's eaten before and you should get a little bit of an you might get a slight anomaly and so.

E

Just only a slight anomaly, not a huge anomaly.

B

Yeah, so it's if, let's say you have five variables in there, and one of them is slightly off. The spatial Pooler I think is somewhat resistant to noise and you'll, see just a few bits difference in the spatial cooler and the temple memory is looking at the spatial color output, so it might not detect a huge difference. There.

E

So what if we wanted to increase the sensitivity to that kind of case? Okay.

B

So if you, if you know what field that is, you could have a higher resolution or not higher resolution. You can allocate more bits to that encoder. So.

E

It's the it's the size of the input that affects the magnitude of the anomaly score. It.

B

Affects the output of the spatial Pooler, which in turn will affect the print? How well the temporal memory can predict those bits so.

E

Yeah so when you're, when you're selecting encoders you're kind of already pre selecting winners about, what's gonna be anomalous, what's not yeah.

B

We tend to keep them all equal. That's worked. Well again, you know we haven't really done a lot with more than two or three variables and it's the number of active.

D

Bits not the total bits for the encoder that matters.

E

B

B

Is that does that make sense, yeah yeah I got it okay,.

E

D

E

Very helpful: okay.

B

Yeah I would I always start with the and thinking about anomaly detection when I was debugging and tuning all this stuff, I would start with the spatial fooler and think about what would affect the spatial Pooler, the most and or what will cause the spatial Pooler to do the right things and that and if, if that isn't done correctly, nothing after that, it's gonna work well, and so you know particular if you're deviating from the stuff we've already tried and worked.

B

I think I would start with understanding the spatial Pooler and then the temporal memory does predictions on the bits that are coming out of this picture. Cooler I need it for what you're gonna do for okay,.

D

So question number four here is in the absence of correlated signals and specifically talking about when you start a new sequence. Basically, it's just I think this is a statement more than a question, but you can create your own start signal. That is a unit step function in this scale era, coder and I think what this is giving.

D

That is that, if you are in the middle of a sequence and then you transition to a different sequence without warning, then the model will have will say will give a really high anomaly score, because it's not expecting the new sequence and that one way to get around this is to put sort of a random element.

D

In this case it's not random, but it's basically making sure it's some element in between the sequences that will break up and kind of break the model out of its predictive state from the previous sequence. And so what another way to do? This is just to read to give a reset to the model so model that reset will basically get rid of the current state and just start from scratch, and so that should do the same thing as putting a unit step function in yeah.

E

That's kind of what I had to develop because I'm not using a streaming data I'm, just using a fixed data sets and the and yet.

D

E

Doesn't always start at T equals zero, so I don't know, always know where the to do recent sequence dates. I, I, say.

D

Interesting, so it should be too much of a problem, because if, if I mean assuming that this sequences are things that recur, let happen multiple times that the models going to see multiple times, then it will. It will learn the sequence and things that are noise in between occurrences of C of predictable sequences. Well, we'll just be considered noise, so I'm, not sure.

B

I understand, if you it sounds like you know, when the sequence is starting. Otherwise, you wouldn't know how to when, to put in your start signal right. Well,.

E

There's there's three different types of start signals: there's the implicit start, which is at T equals zero and then there's like the generated start where you use some kind of detector to figure out where, where you actually worth event you're interested in it occurs, and then there's the third one which, if your, whatever your your your natural data, if you have some sort of natural start signal that comes in on a different channel and so I've tried it all different approaches. And so the generated is not always you know accurate, but it's it's useful.

E

You can get a natural start signal from whatever process, you're studying.

D

Yeah, so my recommendation is, if you can, if you have a start signal like that, that you you know, works well, is to use the reset functionality just a simpler than putting in you know the using the unit step function.

E

Yeah, that's what I ended up doing is I just synchronized all of my data sets and they all started to equal, zero. Okay,.

B

That works too yeah, that's it cuz yeah. Otherwise, it's just something else. The system has to learn to predict so.

E

B

E

I did spend a lot of time on not having the start signal and actually I actually got. Very I got very good at predicting our detecting anomalies on things that start at random times so because, because when you just have a data set and in the sequence starts at some time, it's always gonna be anomalous at the beginning, because it.

B

E

Expecting it exactly.

B

E

I had to provide that contextual information, but then you know just eventually: I realized, oh I, just set it at T equals zero. If I can do it, but that's not always possible.

B

Yeah another thing is, if you're using the Nami likelihood, if you know that this sequence has just started, and things are going to be anomalous, you may not want to pass that into the anomaly likelihood. I'm, not sure it was. The nominees course.

B

Otherwise, it's going to think it's a very noisy stream. I forget if we had a reset function and then I'm like I, don't think so. Nope.

E

You have to destroy and create a new one yeah, but the problem is.

B

If you destroy and create a new one, you have to retrain the whole thing, and it takes a few hundred data points to retrain the like yeah.

E

There yeah, you can do a Python pickle and in restore it that way. Oh okay,.

B

Yeah, you could do that. You could do that at the end of the day, the anomaly likelihood is just a Gaussian, so there's only two numbers, you need the mean and the variance inside it. Oh no, but.

E

It keeps the whole history I think yeah, there's there's a bunch of stuff in there. So I, don't I, try not to concern myself with the details. Okay,.

B

Okay, so a new training set, you have a handful of normal examples: they so 8 by 8 you mean 8 sequences that right correct and then each sequence, you train with 20, repeats right and then, at the end of each training run you do a reset, so you're not resetting between sequences. No.

D

I have I okay,.

B

So, at the end of each sequence, you call reset yeah.

E

So that would be 100 reset so now 88 times, 20 is 160. Yeah.

B

Okay, that seems fun. That seems good.

E

I checked, yes, that's the statement, but I actually figured out. You know, I asked it should be non. Interleaving, cuz I actually try to interleave the training and it doesn't seem to pick up on the the sequences. Very well, hmm that's weird between it shouldn't matter. I'm eight, maybe I'll, try it again. Maybe I was using a bad parameters at the time. Okay,.

B

B

How do I learn long sequences effectively.

B

So the number of repetitions you need to learn a long sequence is two times the length of the sequence x.

B

How high order the order of the sequence? So that's the technically correct answers. Let me try to unpack that, so there's high order sequences, which means there are sequences which share common elements, and if you have common elements in there, let's say you have at most three shared elements in sequences that are ten ten elements. Long then you'll need to learn it about to each you'll have to do about six repeats of the whole sequence.

B

Okay, two times three, and so the the more long term context you have in the sequences the longer the training time has to be so.

E

In my question, it was I've got a handful of sequences, some of them share commonality, mm-hmm.

F

E

Can I, how long does it take to train to learn that commonality, and so then I can say anything any any new sequence that comes along can fit in this family is nominal or anomalous almost.

B

Yeah, these are these all discrete there's this discrete data or scaler, just as a scalar data, Oh scalar.

E

Data, okay, so yeah I use the you know the the width W yeah to try and you know- share, share input, bits to try and generalize the sequences, but, of course that makes it less sensitive to anomalies. Yes, yeah.

B

Your training, each sequence with 20, repeats I think that should be pretty good.

E

But is it? Is it living ok? So if I get to like T equals 10, and at this point, like half of the sequences, have one number and half of the sequences have another number but they're partially overlapped? Can we jump from one sequence to the next? If we reach that midpoint, you know what I mean.

B

Yeah, so if you have, you know a sequence like ABCDE F and another sequence like WX bcz, something like that, so they they share two elements.

B

If you do enough repeats, it'll keep track of which one. So if you start with ABC is going to predict D and if you start with WX, you know BC, it's going to predict Z. So.

E

If we make it a scalar example suppose we have sequence, 0.4, 0.9 and another sequence, 0.6 and 0.1, is it possible to jump from one sequence to the other if they're sharing input bits, what.

B

Do you mean by jump from one to the other, so.

E

If so, if I get it, so, if I get a new input 0.5, you know the previously learned sequences are a 0.4 and 0.6 so that there's there's partial overlap there. Okay.

F

Why are we are.

E

We predicting both branches, both sequences.

B

So it depends on your resolution resolution of the encoder, but you should be able to do either one or so, if you say, if you have a very high resolution, encoder 0.4 and 0.5 might be completely different. In that case, it won't predict anything, but if you have a coarse enough, encoder they'll be very similar and if, if in the limit, it's going to predict both 0.9 and point 1, so.

E

Yeah I'm just I'm just wondering about the trade-off about you know: what's the trade-off for making a you know a wide input or larger W, so you can make those transitions if we have a large w. But what are you sacrificing like this you're sacrificing the sensitivity? I know yeah.

C

B

E

B

So I mean that that's an.

E

Application question.

B

With me, I, don't know what.

E

B

Your scenario, okay,.

E

I'm just curious, but.

B

If you do have the flexibility to tune that sure.

B

Look I know we're getting close to four before we get to up this stuff out. There are any other questions that we shouldn't before. We can't answer. Jake's questions but I know we're getting close to four now.

D

All right: okay, not a question from Glen button, he's happy as the opportunity to okay, Thank, You Glen.

C

And then everyone here, it's really I'd love what you guys are doing and I appreciate it when they have questions.

D

Awesome awesome: does anyone in chat want to have it when I just speak up and with any questions before we go back and do the rest of Jacobs.

B

C

B

Can go through here, okay, so we talked about long sequences, so pen length backtrack. So this is a really old version of temporal memory that you're using not the latest. That's fine! What I would say is you're making.

D

It sound like it's like a inferior, but it's our. It is our production and and most accurate, most performant, both as far as for that's true for an omnidirectional session results and so.

B

I think it's that I think.

D

I would I would recommend something to use, they might get the best results, but our research is using a newer version that is a little closer to biology. Doesn't have some of these optimizations in it, so Pam length isn't a thing in the new one, but that's where your conductivity from right, yeah.

B

Yeah, that's right so.

E

So you guys have an internal version they use, or is it.

B

It's a new pic as well. It's a new pic as well. There's, there's temporal memory, pi and temporal memory in our research code. We also have another version, which is very similar with you know, we're doing things like feedback them sensorimotor stuff with temporal memory. So we have a slightly different version of the code in the research repository, but.

D

I, just you're just trying to get great anomaly detection results, I.

F

Think this is yeah, so Scott Scott's going to answer your questions.

D

Yeah, so can you explain how these work you basically you've played around with these and they seem to affect your results and it'd be nice to have some guidance on what they mean, so you can competitor, inform your experimentation. I think is what you're asking here so.

D

B

Be pumped to check in this video chain that a whole video on this stuff yeah afterwards.

D

That would be good it just we don't have time to really go into this, but the simple version of what these things mean so max in phys inference in max learner's learn, and it's a the number of the maximum number of steps that you can backtrack, and this is a an optimization in this implementation. Temporal memory where, when it's predictions are not correct, it will basically go back multiple steps until it can pick up a sequence that it could have followed. That would have been correct. Yeah I think, is that correct, sir yeah.

D

So basically, there are yeah. Essentially there's something like that, and then Pam length is, is a variable. That's used for determining how long to extend sequences in each pass.

B

So yeah so I mentioned earlier that, with high order sequence, you to do multiple repeats having a higher Pam length kind of avoid some of those repeats you, it sort of really tries to learn high over sequences and and so you're right. That Pam and length of 100 would mean like it things. Okay, sequences are very, very high order and I'm just gonna keep thinking.

B

This is a new high order sequence and the problem is that, if there's noise in the data, it's going to learn all of the noise as each kind of noisy sequence as its own high over sequence. So this may explain why you're finding that it's very brittle and in generalization.

B

D

Up a tree, you basically figured out this variable on your own. It sounds like yeah yeah.

E

Well, I: don't look it for the code to okay.

B

Yeah, so when it's just you know this stuff is these are optimizations that were put in to get it to learn quicker in some situation. Aereo's, and these are not necessarily biologically accurate, although I think Jeff thinks there. That could be a biological analog to Pamela as well, when you're just trying to memorize something- and you know you're- trying to memorize something.

E

That's all the justification. I need.

B

E

B

Of how we were we put it into okay, selecting this scale, I think we did. We talked so we talked about resolution.

B

You know to me it really if you're diving into the details of what's going on, I, always start with the spatial fool or making sure it's behaving correctly. As far as resolution goes, I.

B

Know there's a question here: encoder selection for anomaly, detection.

D

Just for the sake of time, we've talked about how encoders are are important. You have to get your, you have to capture the semantics correctly and your encoding and get the right proportions of bits and whatnot Delta encoders. Definitely try them out see if they help or not they might, they might work. In some cases you might only need a delta coder and not even need a scale encoder. Normally we find the scalar encoder. We start with that and then don't let go termite help, but you set experiment. Yeah.

B

And nab we we found scalar encoder works best across all the data sets. You know all the kind of industrial data sets that we've tried on Delta encoder the one case where I found it was useful is when you have a data, that's continuously increasing and continuously decreasing, and it's really that the changes that are more important, that the magnitude of the change. So if it's continuously increasing the system, will never really predict it, because it's always a new data point.

B

But it's the magnitude of the change that was more important for the vast majority of data that I've seen scalar encoders. This works much better yeah.

D

And you have this note about you: just build your own Delta encoder by taking the difference and feeding that as an input to scale coder. That's that's fine. I am a little surprised that the that's not what the regular Delta encoder is doing, but I haven't looked at that in a while.

B

I think that's it okay, so here's training the same fixed input over and over again completely learned. The sequence and the nominees would be should be zero.

B

Yeah there was a situation where we would occasionally see get mispredictions right, even their training, the same value over and over again right.

B

E

Could probably dig up a whole experiment to try and you know recreate that situation, but theoretically it should flatten out no matter what no matter, what the parameters are right. How.

B

Many I wouldn't say no matter what the parameters are.

D

What how many records are we talking about like how many iterations you do I.

E

Think, like maybe 400 data points and then I think I just trained it like hundreds of times, I think I didn't overnight. One time it didn't make a difference. um I.

D

B

Think if it's like combining sequences but uh yeah I mean, if you're giving it the same value over and over again, it actually does not know what the order of the sequence please well.

E

You read you reset every time, so how long is the sequence 100, just like 600 points of.

B

600 points and you do a reset, so it's the same value for 600 times. Then you do a reset value. The same sequence, 400.

D

Long or 600 long- oh, oh I thought this was the same. Fixed input. Oh oh, maybe I, miss no.

E

No, no Scott's right. It's a.

D

E

A 600 value sequence and he and you just input it over and over and over over and theoretically it should learn it yeah.

B

If it's a single sequence over and over again it should, it should learn it I, wouldn't say no matter what the parameters are, though. I think that you want to stay. If you do, if you get this with the parameters that come out and add, I would be interested that movies price. Alright,.

E

I'll rial review to see I'll look at those NAB parameters and look at my past best work. I was probably monkeying around with like permanence --is and learning rates, which is probably a bad thing. Yeah.

D

E

You should be you.

C

Definitely should be able to learn this so.

D

That it just the Nama swiss flat there yeah there's a possibility that maybe you're you're, you were oscillating somewhere with the the too high of a decrement relative to the increment, or something like that.

E

Yeah I also had was it I turned on, boosting I won't think it was which was not good for anomaly. Detection, okay,.

B

E

Spiking anomaly scores yeah.

B

E

Ought to figure that one out yeah.

B

For anomaly detection we pretty much have boosting off.

B

Although you a has put in an improved, boosting methods, it's a little bit better behaved now, but you don't want stuff, you don't want. The spatial cool are suddenly changing. Underneath you when you're doing anomaly, detection.

E

It's correct: okay, thanks guys, yeah.

B

Thank you. Oh, it looks like Matt posted the link to chaitin's video. If you want to understand Hamlin and backtracking in detail, I think.

A

That's it, it was the one called si la something. It says something about CLA, okay, but it's he was talking about the backtracking TM and just to be clear in a new pic in the code, the the production version of TM work. We are calling backtracking TM and the one we use in research is just temporal memory and.

D

If you see references that the one that's currently called backtracking, TM has had previous names, so the backtracking TM is synonymous with TP pi or a TP 10 X 2 dot PI, but those are just. The Python is C++ versions of it.

D

B

Hopefully, like new names better, where is the oh here.

A

Backtrack, yeah I.

B

A

Okay, so we're way over time- and we got through everything, I think I right had a chance to ask questions. So I think this is another HTM, a KERS hang out in the bag. We didn't really talk about anything except anomaly detection, so the next hangout will just be a standard one, but I really appreciate soup, Tai and Scot taking their time to answer these questions and you guys on the forums for laying them out, so we could walk through them.

A

I hope we can refer to this video and other anomaly detection questions that come up that when others have similar questions and that's about it. Here's a treat for those of you stuck around ol time. Here's my new brain model really happy with my new brain model, the old one that Jeff gave me was just old and dirty and then had glue in it.

D

Because I bought one recently and it's really small, but it doesn't, it doesn't come apart into that many pieces and so yeah.

A

Sure so I bought a cheap one from the same company for like 35 dollars, but it was just really bad and I sent it back. So this one's not so cheap, but it's a good model, but anyway that's it for HT Macker hangout. Thank you for joining us one more time. Anybody has anything. Here's your chance to speak up.

C

That's a wrap hi guys, I thanks. Everyone Thanks.