Numenta 2014 Spring Hackathon Talks, 8 May 2014

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Anomaly Detection in CLA

Description

By Subutai Ahmad, presented at 2014 Spring NuPIC Hackathon.

Source code referenced is at https://github.com/subutai/nupic.subutai/tree/master/run_anomaly

A

Okay, you good okay, let's get started so I'm going to talk about anomaly detection using the cortical learning algorithm, and this is something we at numenta. We really started doing in earnest uh sort of last year and started looking at a lot of data sets and applications for this and in the beginning um you know we're interested in it. You know I thought it was. You know, anomaly, detection kind of cool. You know detecting novel things in in data as we started getting more and more into it.

A

I really started realizing how fundamental anomaly detection actually is- and uh you know, we're living in a world where the number of sensors and the streams of data is just exploding exponentially and all of you- everyone here has seen those graphs of you know the internet of things, graphs and all that stuff, where the number of sensors is just increasing exponentially and with so much data. You really there's, there's no way to really sift through it all, and one of the most efficient tactics you can use is look to see what's different. What is unusual?

A

That's going on and that's a very efficient way to parse through lots and lots of data, and you know when you hear about the internet of things.

A

People you know talk a lot about the data backbone behind it, but they don't really know what sort of analysis you should do on the data and I've become very convinced that anomaly detection is one of the most fundamental things you want to do with when you have tons and tons of streaming data, if not the most fundamental thing, so I think you know become really convinced that doing anomaly detection well in this new world of lots and lots of data is going to be really uh critical, and so you know hopefully with this talk I'll, try to explain a little bit.

A

What we've done um just to give you a sense of um you know the types of things that you could be using it for one of the the funnest internet of things devices I've seen is: is this one?

A

This is the memo baby, how many of you here have kids or have had babies or our parents, and so on? Okay, how many of you have been a baby at one point? Okay, so I think with this most of you would be interested in this. So there's a company. I think it's called rest devices or something it's a startup in boston. They make this baby onesie with a number of sensors in it, so it senses the baby's respiration.

A

It senses the baby's temperature, it senses the baby's movement, the heartbeat and so on, and it streams this to your phone. So you have constantly you have access to your baby's data and what could be more important to a parent than doing anomaly detection on this data stream. You know if you have your baby and you want to know if something unusual is going on, so here's an example of something that you know where zombie detection could really play a very big role in this.

A

On the other end of the scale, I did a little hack at numenta last week, where I looked at data from bitly and bitly has an api where you can start clicking on things, and I started thinking well, you know what is the web talking about and you can start searching for all sorts of terms and see you know what's unusually unusual in the web today, so um you know what put it through our product grok and this is showing the trace of people clicking on lindsay lohan web pages and sure enough.

A

Our product detected an anomaly one day, and you know it shows the red at the bottom there and when I started looking at what urls uh led to that um she had, I think, declared that she had a miscarriage or something and the web had gone crazy over there. So whether you want to save babies- or you know, you know, follow the paparazzi anomaly detection, I think, has a pretty uh lots of use cases so anyway. So what I'm going to do is uh dive into some of the details of how we do it.

A

There are basically three topics I want to talk about and cover today, one. What is anomaly, detection and I'll, give you some more examples of anomaly detection.

A

Then I want to go into a little bit of detail of how is an omniscore computed in newpick and cla.

A

Today, the new pick code base, you know, does fundamentally prediction and anomaly detection that scott was talking about so I'll describe at an algorithmic level, a little bit what's going on there and then the third thing, as most of you probably know, we released a product called grock which does anomaly detection and it data streams, and I will explain you know how anomaly how this anomaly score is used in the product grok, and what have we learned from actually trying to put out a commercial product that does anomaly detection?

A

Okay, so those are the three things I'm going to try to cover today and I have a little bit of sample code that I'll release as well as part of this, and maybe it'll make your life a little easier. If you want to do anomaly detection today, okay, so what is anomaly detection? So this is a definition from the dictionary you know. Anomaly is something that deviates from what is standard normal or expected in our case that something is a data stream and patterns. So this thing is speaking of anomalies.

A

It works. Can I just put it here? No, okay, I can hold it yeah. I think my ears are not said: yeah, okay, anyway, so yeah so um yeah every time. I think, every time I turn here. It hits something.

A

I'll be stuck you'll have to hear from me for the okay. Well, we'll figure it out, okay. So in our case this something is patterns. um You know, patterns that are coming in over time and you want to detect when the patterns have deviated from normal behavior, um but there's a lot of subtlety behind the statement. So one example of the type of anomaly might want to detect is a simple spatial or static anomaly, and this is the most common type of anomaly detection you'll find in the literature.

A

In fact, this is the most common type of anomaly detection used in the industry as well, where you here, I'm plotting, two different sensory values, x and y, and uh at any point in time one of those dots is the reading of the of that. Those two sensors.

B

A

Spatial anomalies, um so here's an example of the data and it's falling into mostly two clusters, and uh you can see that um you know there are two points there: 0 1 and 01 and o2. That are, you know, definitely outside the cluster, so those will definitely be static, statically defined anomalies and most of the people in the industry actually use something even simpler than this, which is thresholding. You know so they'll say you know.

A

The value of x has to be less than this or greater than this or value of y has to be between this range. So essentially, these are like rectangles in this space um and that's what we see most commonly in the industry, so o1 and o2 would be anomalies. O3 is kind of interesting. It's it's got a few patterns in it. You know a few dots in it. You know. Is that an anomaly or not? Well, you know maybe the first couple of times it is after you see it enough times.

A

It's not really an anomaly, it's rare, but it's not really novel uh as such, so um uh but nevertheless, this is kind of the one type of anomaly detection that that you might want to do. Then there are temporal anomalies, and this is a data stream. I think it's an electrocardiogram, so it's basically heartbeats over time and you can see that in the middle there the heart didn't beat, and apparently this is indicative of arrhythmia or some some condition.

A

You know this type of thing, and this is interesting, because the previous technique of looking at static values would not have detect, would not detect this anomaly. In fact, the values that are in red there are completely normal. They do occur quite often it's just that they don't occur in this particular temporal sequence.

A

Okay, that's what's novel here is that the is the temporal sequence, so that's another type of anomaly that you might want to detect okay, a little more complicated than the previous one. Here's a real world example of this. This is uh data that we saw last year. um This is the gear bearing temperature from a windmill.

A

So you know you have all these windmills and in uh windmill farm and they have tons and tons of sensors um and they're spitting out data continuously, and what you can see on the extreme right is when the windmill actually broke down. You can see the the the temperature goes way up, but right there, the temperature actually starts act, behaving weirdly, it's a little subtle, but uh it's actually very indicative of the failure that happens later.

A

So, if you have a good temporal anomaly detector, you can actually detect the failure before it happens, just because you know that the measurements are behaving in a weird way. Again. Any individual point in there is totally normal. It's the pattern over time, that's abnormal! Okay!

A

Then there are a couple other types that we've seen you can have a very random behavior and you can have anomalies in that random behavior. So this is, uh you know, we're working a lot with it data. So this is data, that's representative of latencies in your load balancer. So if you have a website- and you hit your website, how long does it take before you get that page out and most of the time it's really low, it's near zero. um You know some of the time it's near half a second. You know well.

A

That was another anomaly, um so uh you know most of the time is there and every once in a while a few times a day, it jumps up to. You know two and a half three second range. um You know this is completely unpredictable. You don't know when it's going to happen, it's it's random. Yet you can clearly tell that right. There there's something really unusual is going on. Your website is going a lot slower than it should be.

A

Otherwise, and maybe you know, developer had pushed some code to it or you changed the configuration of your system or whatever you know, you don't know what it is, but suddenly the latencies have gone way up, and this is really hard. It's uh it's there's no real detectable pattern as such, it's just that the nature of the randomness has changed.

A

Okay, so it's another type of anomaly that we see actually quite often in in it data and other types, then there's another anomaly which I'll call like a temporary anomaly.

A

So this is a cpu usage and you can see that on the left instead of hovering, you know between six and seven, uh it's noisy, but it's sort of staying in that range and then there's a point in time when it suddenly spikes up. um So that's an anomaly. Something unusual happens um right there and then you know, then it stabilizes at a at a higher level. uh Now, if you were using a pure threshold or or some other system, you know you would definitely detect this anomaly, but then you would.

A

You would keep triggering an alert. However, in this case this is once it stays there and you can see. The axis is on the order of days once it's there for a little while it's the new normal, it's not anomalous anymore. It's only anomalous at the point. It shifted from one value to another and once it stays there, it's no longer anomalous, okay. So the regime here is now normal, even though the point here is abnormal.

A

Okay is that that makes sense.

A

Okay. So there are a number of different kind of subtleties that you have to handle when you do anomaly: detection. Okay, so uh just to recap, anomalies are so we're defining as any significant deviation from normal behavior anomaly. Detection is extremely valuable in this area and it's hard to do well. There are many flavors of anomaly detection, as I went over spatial anomalies, temporal anomalies changes in randomness temporary stuff, where you have to adjust to a new normal.

A

So all of these kinds of things have to be handled and one of the kind of hard things we also found is there aren't great data sets that capture all of these things in in the industry. You know, maybe there are some we haven't found. There are some that capture. You know individual things, maybe, but it's not really it's hard to find a good data set. That really captures all of these things that we want to have.

A

So that made it even harder to sort of evaluate how well you're doing- and you know, try to compare against other algorithms and so on.

A

Okay, so that's anomaly! Detection! Let me talk a little bit about how the anomaly score is computed in newpick. Today, okay, so newpic implements anomaly an anomaly score for streaming. Data sets and scott showed you programmatically how to do it. We have example code for that.

A

It's a core feature of the opf, the online prediction framework, and basically you just set inference type to temporal anomaly, and then it will output an anomaly score, that is, between 0 and 1, for every data point and new pic its implement today will detect a lot of the spatial and temporal anomalies that that I showed you and it does a pretty decent job at it.

A

It's also because the nature of the cla, it's a continuously learning online system, so you know those temporary anomalies, it'll it'll, detect the anomaly and then it'll quickly. Learn that that's the that's the new normal and uh one of the really unusual things about um uh the anomaly detection in nuke pick is that it can work for both numerical and categorical data.

A

In fact, you can actually mix the two in the same data stream and it'll still work just fine and the the reason for that is that we translate everything into this common representation, the sparse distributed representation. Then everything works on sdrs and as long as you can convert it to an sdr, you can do anomaly detection on it with with newpic okay. So this is a. This is a really nice property of the anomaly detection, algorithm.

A

Okay. So what? How do we actually compute? The anomaly score? Here's a typical kind of opf model.

A

You have your data stream coming in and you have a number of encoders that take that data and converts it convert it into ones and zeros and again scott showed you how to do that, gets fed to a spatial pooler. Then a temporal cooler and then outcome predictions.

A

Okay and, as you know, the cla is constantly learning common spatial patterns in the spatial pooler and it's constantly learning the common temporal sequences in the stream of inputs that are coming in and that's done in the temporal pooler and then at each time step. The temporal puller is actually making multiple predictions about what might come next, okay, so at every time step it has a expectation of what's going to happen in the next time step, and it's not just a single expectation.

A

It has it can actually branch, it can have multiple uh predictions of what might actually happen next and an anomaly score is really simple. uh What we do is um we look at the current value. That's come in and see: okay was it predicted from at the previous time step and how well was it predicted if it was predicted really well, the anomaly score is zero.

A

If it wasn't predicted at all, then the anomaly score is one, and if the prediction is somewhere in between, then it's between zero and one and the way this works is if you're familiar with sdrs, it works in column space. So the um the the current input is represented in the opf, typically as a set of 40 columns that are on and the temporal pooler is going to be predicting some number of columns to come on, and we just look at how many of the ones that were predicted.

A

Actually, I'm sorry how many of the ones that are currently active are actually overlap with the predicted columns and that's how we we compute the score. Okay, so it'll be a number that'll go from zero to one in increments of 140th. Basically,.

A

Okay, any questions about that.

A

Okay, let me give you a simple artificial example, and uh let's see if you, if you get this, so here's a simple model of a of a sequence, so you start at a and with 80 chance you go to b, 15, chance c 5, chance d and then no matter where you are here, you always go back to a okay. So here's a example: temporal sequence: it's coming! Yes, the powerpoint does have this animation.

A

I have to have at least one gratuitous animation in there. So let's say you have this sequence and you've been learning on on these statistics and you're at a and now you you, you have some prediction about the next input that's coming in and let's see what actually happens so suppose, b c or d occurs.

A

Okay, what would the anomaly score be.

A

Either, either b, c or d, if any of these occur here, what do you guys think the anomaly score would be less than one it'll be zero uh here, because it's if it's exactly, b, c and d and we've assuming we've learned, everything's learned well and stuff it'll be zero, because the temporal pooler will have predicted the union of these three to occur. Okay, so anomaly score will be zero.

A

What if e occurs?

A

Okay, one any other guesses.

A

So it's exactly so it's it's generally going to be one, but really it depends on uh how similar it is to b c and d okay. So if it's completely different from bc or d, it's going to be one and that's the most likely, particularly since this is kind of a trick question.

A

I did it as alphabets, instead of numbers yeah very close to one, but if they're, if these have semantic components to it, uh if they're similar to bc or d, then it'll be between zero and 1 again in the depending on the and the similarity is in the column, space, okay,.

A

Okay and then the next thing is suppose a to e repeats. Quite often, um you know multiple times in this scenario. Well, then, anomaly score will drop to zero again, regardless of how different it was from bc or d okay. So this is the continuous learning part of it simple example, but hopefully gives you a flavor of at each time step what the what the thinking is.

A

So here's an example. This is the same data set I showed earlier. This is the the value and then this is the anomaly score over time, and you can see that this. You know we're continuously learning system, you start from the very beginning, and you just go so in the very beginning. The cla does not really know how to predict.

A

It doesn't know anything about the data, so anomaly score is going to be high in the beginning, because it's just everything is new at this point in time and then it kind of settles down every once in a while it'll go high because there's some random thing here that it didn't predict remember this is still a noisy data set and then, when it jumps up, uh nominee score again goes high because um it doesn't know anything about.

A

Of course it didn't predict this transition, but even here it's not predicting the step-by-step transitions because it hasn't seen enough of that. But then, over time it's going to go down again and again, every once in a while, some random event will occur. You can see it picked up these two here with a high anomaly score, but then, after that, even that is kind of expected and so anomaly score will will go down.

A

C

Yeah: okay, um what if uh say, there's a jump in the data, but it's, but it's not necessarily a change in patterns. It's just a change in the the level of the data. If the same patterns continue just at a higher level, yeah will will anomaly scores stay high after the transition, or will it understand that it's the same pattern just at a higher level.

A

So it depends a little bit how you encode the data. If you just encode it normally like the way we do it. It's not it's going to treat this as completely new patterns, and so once it jumps up, it's going to start learning in this regime again and that's the most flexible thing, because you know the chance that it's going to be exactly the same here is pretty low. Generally speaking, it's going to be different.

A

You could encode, you can have a differential encoder, so you don't encode the exact value but encode the differences from one step to another. In that case, you know the anomaly score will jump up here, but then it'll say: okay, the differences are the same.

B

C

B

Does this work with hierarchical uh regions so like if you had a melody, for example,.

D

Yeah or a sound wave.

B

And it was very quiet and then the volume went up, but the melody stayed the same. Would it? uh How would it react to that.

A

Yeah, so nupic today doesn't implement the hierarchy in the opf, but in general yeah it should it will work exactly the same at every level of the hierarchy and- and you know, if you have multiple modalities, um you know you could have an anomaly in one of the modalities and not in the other. So um you know if you have melody and volume, if they're encoded separately, you know you could actually detect the fact that it's the same in one area and actually changing in the other part.

A

Okay. Here's another example. This is uh the the temperature of a heater in a in a office building, and you can see that this is just part of the data set, not the whole thing. You can see that um you know there's sort of a regular behavior, it's a little hard to see here, but there's a unusual temporal behavior here and the anomaly score here. You know generally jumped up, um and you know here the temperature goes down and it became unusually low and in this case the anomaly score again jumps up here.

A

It hasn't seen. You know it sort of spikes, the low here and the anomaly score spikes up there, because it hasn't seen uh temperatures that low there.

A

Okay- um and I think I can't see exactly here- but there are some other anomalies when you actually look into some of the zoom in some of these. There are some unusual things that that it finds okay, here's uh one of the toughest cases. This is the random data I showed earlier, and you know the anomaly score is much higher here than than average, but it's kind of hard to tease the power. So this is a example of where the current scheme- I'm not sure it really adds any value over the original data stream.

A

Here, in fact, it's a little more. You know bushy here, but I'll show you a little bit later how we deal with this kind of thing in grock, but this is here where this the data set is just inherently unpredictable and so step by anomaly. Score of the type we're doing a new pick doesn't really help you too much here. Okay, although you can visually see that it's different.

A

I'm going to explain it yeah yeah yeah, but in newpick today, if you were to this, is what you would get and I'll I'll explain exactly how we solve this problem. Can.

E

I ask a quick question yeah, so I mean if you're, when you were talking about like web server latency I mean, can you just filter out the uh you know? Can you just basically do like a? uh I guess, it'd be like a short pass filter, a low frequency. You know basically filter out all the high frequency stuff and just look at the low frequency stuff and then and that'll make it a lot clearer. What's going on.

A

Yeah yeah, I think, with I think, as a general comment with any of these data sets, if, if you as a human we're looking at it, you can craft something that will find that anomaly. You can definitely do that as a data scientist or someone, you can come up with a mechanism to craft that, but the thing is you don't know what it's going to be ahead of time.

A

You know in a you want to build a system, that's going to do well on a very wide range of uh of data types, and you know, as the number of sensors increases so dramatically, you can't be looking at each single one, seeing what the characteristics are and then uh you know come up ahead of time with with what the exact set of uh features, if you will that's gonna cap that so I think I think you're right. I think for this.

A

If you were gonna capture this particular anomaly, um you could definitely do that, but it it probably wouldn't capture some of the other ones.

E

Yeah, but I mean, if you have a particular use case, you know what time scale is relevant right. So if you know like you, have a particular type of sensor.

D

You know which.

E

Time scales are relevant, so you can optimize this in advance for that time. Scale. That's appropriate for that sensor.

A

Yes, if you know a lot about that particular sensor and this domain, you can definitely do that.

A

Okay, so um I put up uh some sample code uh in this. um In my github uh repository, it's called random anomaly and I put in some of the data sets that I've been showing you and I put in a version of the hot gym anomaly script that scott had created and made it so that it's easier to run multiple data sets and output, the look at the outputs and and so on. So you could take a look at that and these data sets are actually in there.

A

So I think in newpick we don't really have a great anomaly detection data set today we're just using hot gym, which is not really an anomaly detection. There's nothing truly anomalous going on in there, but these data sets there's something a little more interesting going on. So you can take a look at that.

A

Okay, so let me switch topics and talk a little bit about grok and how we have built grok on top of new pic and how we solve some of the issues that that we've seen so quick kind of one slide description of grok. If you uh haven't seen this um so grok is the product that we put out, that's built with new pic and essentially what it does is it does anomaly detection for um data from amazon.

A

You know data, that's coming from amazon instances, so you know amazon has a cloud watch service which is constantly monitoring all of your services and and there's a data feed that grock can input from and from there we will automatically uh put out. uh You know give you allow you to monitor all the metrics and then see what's unusual in your it infrastructure at any point in time. Okay and the nice thing about this- is that it's completely automated, you don't have to tell it what time scale or you don't have to tell it.

A

You know whether it's load, balancer or database data, or anything like that. You just feed in you know hundreds of streams of data for each one. It does automatic anomaly detection and then there's a really nice mobile interface, which we can show you later if you're interested, which you know sorts all of your metrics. According to how you know what is most unusual at any point in time. So, even though you might have hundreds of machines, you can just very quickly and efficiently just look at what's unusual right now, so you can.

A

You can just focus on on. What's going on, we also have an api to feed in custom metrics. So you can feed in data, that's not from cloud watch, but your own data and you can feed those in as well, and that's how I did the lindsay lohan thing just that I showed earlier. So you can. You can feed in uh uh uh you know you know other data, that's not cloud watch data as well, so it's really kind of a general purpose.

A

I wouldn't say general purpose, but it's a it's a fairly flexible anomaly, detection platform that you can use to try to try out some stuff and because it's using new pick, it has all of the advantages of the cla and it continuously learns and adapts to changes. It deals with temporal anomalies and so on. Okay, um so let me show you a couple of examples um of what we can detect here. So this is the essentially the same kind of data that I showed earlier.

A

If you have a sudden change in your data, we can automatically detect that typically in an I.t scenario, you would men, you would have to manually set thresholds and say: okay if the cpu goes over, a certain percentage then raise an alarm or if my memory usage is below a certain thing, then raise an alarm with this.

A

You don't have to set any thresholds, it's automatically kind of gauging the common spatial patterns and then, if there's a sequence or a pattern, that's you know not what seen before it can uh flag it with a high anomaly score: okay, so, and I'm going to explain exactly how we compute this this color here.

A

But this is a simple case where we detect it and, as you can see once it detects the jump it, it will automatically adapt, and this is now the the new normal. But at least you you get an alarm when, when this happens, um here's another case where the cpu usage is a is extremely noisy and it's sort of gradually drifting up, and um you know after a while, when it gets beyond kind of the noise period. um You see that the anomaly score goes way up, uh because now it's.

A

Finally you know it's anomalous at this point. If you were to just look at this point well, it could still be part of the noise, but once it's here it's definitely anomalous and you want to know about it, and this kind of thing is quite hard to catch and often important to catch. If you have a memory leak or some process leak or something like that, you won't be able to catch that kind of stuff, but it can also catch some pretty subtle changes.

A

So here's an example of an instance- and let me just explain the display a little bit. This is one server that it's monitoring and these are different metrics that are associated with this server. So there's disk, io network input and cpu, and it's detected an anomaly in the network bandwidth for this instance- and you can click on this and then see the data that actually led to this, and you can see it's this very bizarre, uh dense uh usage here and there's an anomaly right in the middle here.

A

Oh, the reason it's dense is that we're seeing one week of data here, so the data is sampled every five minutes and we're seeing a full week's worth here, and you can see that there's these big spikes up, that it does not catch us an anomaly in this case you can actually go back in time and see that these actually occur uh regularly.

A

These are like regular cron jobs that do a backup or something regular intervals, but it did it caught an anomaly somewhere in here, so you can actually drill in there and see what's going on, and what happened is that when you go to the daily level, you can see that there's a very regular, extremely regular pattern. That happens very, very predictably and at one point in time that that thing happened twice and it caught that. So it's extremely unusual, given how regular the pattern is this this particular thing is extremely unusual.

A

If this had been very noisy, then this would not have been unusual. It wouldn't catch it, but because it was so regular it caught it, and quite often we find that when this kind of thing happens, uh it's usually uh you know, one of our engineers has logged into the system and is doing something. That's unusual for that for that server, so detecting these kind of behaviors is, uh is a really nice uh feature of garage here.

A

So how do we? How do we do this? What have we learned in the process of using new pick for anomaly detection? Okay? So what have we learned from grok? So, first of all, again, anomaly. Detection is extremely useful as I've been showing you real world data is really really noisy, and it may be that for some of them we can never build a perfect predictive model.

A

It's just inherently noisy, inherently random, and you just have to deal with that, but still humans can detect changes in that noisy behavior.

A

So one of the things we learned is that you can't actually set a threshold on the anomaly score and I think you saw that in some of the other examples you can't just say: oh, if an anomaly score is 0.8 or higher, then it's you know raise an alarm, because that would raise too many false positives.

A

You know, high anomaly score is not always bad. um You know a lot of times. Just random stuff happens normally and if you can't predict it, it's going to be a high anomaly score. So what do we do at that point?

A

The interesting thing is that visually you can see a qualitative change in the anomaly score, so it is picking up on something you just need to be able to pick up on that, and so what we decide to do in grok is that we detect the change in the anomaly score itself.

A

Okay, so what we do, is we compute a probability model of the anomaly score? We see what what are the average values of the anomaly score likely to be. We compute a model of that and then given a new set of not recent anomaly scores see what the, how likely it is that that was actually predicted. Actually, let me just have a slide on that. um So what we do is for every new data point.

A

We compute the anomaly score using the lpf, exactly like I showed you before, and then we estimate the probability distribution of the historical anomaly scores. If you look at all the anomaly scores that have happened up to this point in time. What is the distribution of those and we model it as a as a gaussian today and then, given that past probability distribution, we look at the recent anomaly scores and see. How likely is it that that previous distribution generated this recent anomaly scores? Okay, so we can compute a likelihood a probability for that.

F

Really clear, if it's not clear, we're looking at an a recent distribution like we look at multiple ones in the last hour.

A

Right right, I know you.

F

Said this, I'm not sure everyone will catch this, but so we can't you can't do anything on a single anonymous score. You have to look at a recent average of anonymous scores compared to a longer historical average.

A

Yeah, so um this is just kind of stressing that a single anomaly score. um You know you cannot threshold or use a single anomaly score by itself when you have inherently noisy data, you look at the distribution of anomaly scores and we are comparing the recent distribution against the long-term historical distribution.

A

Okay, we're not doing it on the data, we're doing it on the anomaly score itself and that you know that trick actually helped uh tremendously in some of the the data sets that I showed you and I think in a practical system, you at least with the the state of newpick today. You'll have to do do something like this. So let me give you a couple of examples.

A

um Here's a example of a cpu usage. um You can see that there's here's, you know one type of behavior and then it starts behaving differently and then it spikes up again and here's the raw anomaly score again. You can you can see it's it's definitely it's different here than elsewhere, but if you were just to do a threshold on the anomaly score, you'd pick up on a lot of extra stuff, but when you run it through the likelihood model, you can see that it really teases this apart.

A

So here you could see that that recent distribution of anomaly scores is extremely unlikely, given the past distribution, so it very cleanly picks up this spike. But what's also interesting is that there are a couple of other places where it's also high, and you can see that it picked up on this square behavior here. So the right way to I think to look at this is to say suppose suppose you're here at this point happen in the future. You're right here, you've seen this and you see this square up.

A

Is that an anomaly or not, and I think you'd say it is, and it manages to pick that up. It also picked up uh this little behavior here as anomalous. You know when it's changed from going. I don't know what, from this kind of behavior to this kind of more cyclical behavior, I picked up on that change, but then that's normal now uh and it's fine and then it picked up on this uh drastic change.

A

Okay, so it it's not perfect, but it really helps to kind of tease apart what we were able to see kind of visually in the anomaly scores before okay. Is that clear, yeah go ahead.

D

So, are you uh been advocating uh using a threshold for the likelihood score as as distinct from the anomaly score, because you're trying to somehow encapsulate uh the human supervision- or you know those visual cues that you would get by just looking at looking at the score itself?.

A

Yeah exactly so, the question is, you know: do we want a threshold and likelihood score instead of the anomaly score and that's exactly right? I think you can you can by thresholding here you'll get far more informative.

A

um You know notifications or alerts or whatever you want to do with the anomaly score with with the uh system than if you just look at the anomaly score itself and what's also kind of nice is you can actually put a probability measure on it, so you can say you know, notify me if you know there's a one in a thousand chance of this happening. So you know you can kind of control the frequency of your notifications that way, naturally, okay, here's uh the other example.

A

I showed you where you have extremely uh random data um and uh when you run it through grok, you can see that the likelihood stuff picks up that regime very nicely it. You know it picks it up as an anomaly and you can kind of zoom in and see that it's able to do it actually pretty quickly as soon as it starts, these latencies start slowing down a lot.

A

More often, um let's see this is the windmill data that I had shown earlier, and you can actually uh run that through grok, and you can see that it picks up the anomaly right where it's the wiggle starts to happen. This is, I think, a stretched out version of this. It's a little hard to see, but this this, the point of where it picks it up, is exactly where it starts. Wiggling, more.

A

Okay, any questions about that.

A

Okay, so the anomaly likelihood code, um so the nominee likely scheme has proven to be pretty uh critical instead of converting, where we are today with nupeg into a practical application and so we're actually making the code available. It's on that same github account right now, I've a copy of the code there that you can take a look at it's right now, just a self-contained function. It takes in a list of anomaly scores and outputs a likelihood for them. I think it's, it might be useful to look at.

A

I wouldn't say it's in a super easy to use form. Yet it's it's in kind of the way we use it in grock, but one of the things I think we could do is create better sample code around this and then maybe integrate it directly into the opf. So the opf itself is outputting, not only just the anomaly score, but also the likelihood code. So then you can you know you can decide which one you want to use.

C

Okay, jeff yep.

B

Is the likelihood code is that based on the cla and are you running basically, are you looking for anomalies of anomalies, I mean, is it the same algorithm or is it a completely new algorithm.

A

Yes, no, the the algorithm for the likelihood is very simple: it's just fitting the anomaly scores to a gaussian and then using that to compute the probability okay, but ultimately it might be nice to see if you can do it in a hierarchy or compute anomaly of anomalies or anything, but.

B

Right, that's one of these results feed it back right, yeah, yeah, yeah,.

A

Because I mean the brain is not doing the nominee likelihood, but you know we're able to see these anomalies, so there's got to be a eventually a better way to do it yeah, but I think you know in terms of building a practical product with today's code. This is a pretty good scheme, I think, but I wouldn't say it's the be all and end all of every thing.

A

Okay, so um that's one! Probably the biggest thing we've learned uh in in building the practical system is that data sets are really noisy you, don't you can't threshold the anomaly score, but building this kind of probabilistic model really helps you kind of tease apart, where the changes actually are and by the way the the simplest uh stuff also works with this.

A

So if you have a very regular data set- and you have a anomaly uh in there and just a single anomaly score going up- well, that's also going to be caught by this as you, uh as you saw in one of the examples there. So the simple stuff is covered by this, as well as the more complex, noisy areas.

A

Okay, so the second thing we learned is a little bit about swarming. So what about swarming? We talk a lot about swarming. So swarming is an automated parameter selection mechanism in newpick, so it basically allows you to run hundreds of models with many different permutations of the encoder parameters, the spatial puller parameters and so on, and it picks the best model and the best combination of feels that that gives you the best predictability.

A

So in swarming is actually, if you run it, it's a pretty time consuming process and it it's really impractical to use it in in a product if you want to get results very quickly. So what we do in grock is we actually use a single pre-swarm parameter set, and we can do this because we have a very constrained kind of setup within grok, so we have a very fixed set of fields. We have the timestamp and the metric value.

A

We have data, that's fed in every five minutes, and what we found is that the single pre-sworn parameter set works really well across different data sets as long as they have kind of these two characteristics here. If you can fit it in here, it seems to work pretty well. You can still do better if you swarm, but the improvements is, you know incrementally pretty small. So you know this was a really huge step.

A

One of our interns actually figured this out last year and uh it just made a whole bunch of problems go away when in in doing um doing a production system, so, at least within this constraint set you can use a single pre-swarm parameter set, I think in general you will still need to swarm or do some sort of model selection. um I think there's a great set of tutorials online, put together by madden there's lots of people here who can help you if your data can't fit into this format.

A

You'll probably want to do some swarming, but the good thing is what this really taught us is that the system is pretty insensitive to small parameter changes. You know if the structure of your system is of your problem is not changing much then you may not need to swarm too often you can swarm. You know once or twice and just use that model parameters. So uh it's this was a a really nice kind of learning that we had last year.

F

Yeah, I'm just going to add something on top of that, uh the comment and the second bullet up there, where it says fixed set of fields. uh It was interesting with this intern, who did this uh determined this by looking at data that was unrelated to what brock looks at, and so um I want to make it clear that the the pre-swarmed model is not pre-swarmed on the data we actually use in grok for servers right. uh It basically seems to be independent what the data represents.

F

So uh what the point of this is that, if you have a single metric plus a time stamp, every five minutes, it should work. That's been our observation. So far, it doesn't have to be pre-swarmed on your data.

C

Right right, uh it's not.

F

Specific to the data, so we feed in data into grock that goes between zero and 100, like cpu utilization between zero and six hundred million. For I don't know whatever that is networking.

C

Or something like that.

F

D

uh The patterns look all.

F

Completely different, so really just wanna make it clear that it's just the number of fields that matters we just it might be work of two or three fields. We don't know, but.

D

F

All he says: if you have a single fail, the data and a time stamp, every five minutes doesn't really matter what it is. It seems to work well.

A

Yeah and you could see that with the windmill data that I passed in, you know I worked on that it was never swarmed on that it wasn't swarmed on the I.t data. I didn't know anything about lindsay, lohan or anything, but it you know as long as it fits into this uh format, it seems to work pretty well, and um you know that might be important. You know, if you you know, for the hackathon, if you think about anomaly, detection and projects, if you can fit it into this format, um you can just avoid swarming.

A

You can use the parameter set that we have uh today.

A

Okay, uh so that's grok, and so you know: where do we go next? um You know, I think, we've seen that the cla is proving to be really good at detecting anomalies in a large uh range of data sets that we've tried so far and within the context of grok. It's fully automated there's no parameter tuning whatsoever.

A

We've learned a lot in the process of creating the product, and I think you know with talks like this. I think we want to spread the ideas out into the new big community as we learn more and hopefully others will do the same thing. So as a community.

A

We can get better and better at this this process, but at the same time it's clear, I think we're just scratching the surface, there's so many other types of data and- and you know multivariate anomalies and all sorts of stuff that we haven't really sorry to capture, let alone uh you know you might want to do anomaly, detection on text streams, you know using the sept api or some you know, there's anomaly detection, I think again is a very fundamental capability. So I think we've just started scratching the the surface of this.

A

uh So I think we still have a long ways to go. One of the things uh that we're starting just starting to do. I mentioned earlier on that it's really hard to find good data sets that cover the full span of things that we want to want to look at.

A

So what we've decided to well here says heart. What we decided to do is actually uh create a benchmark data set and a testing methodology. That's focused on streaming data and anomaly detection.

A

So what we want to do is create a benchmark that models, real-time online, continuous streaming data sources and with an emphasis on temporal data, automation, continuous learning, all the things that that we've talked about so far and along with a well-defined methodology for evaluating these algorithms, and maybe some baseline results with the current set of state of cla and we're hoping that by creating this data, set that'll help us again in improving the technology down the road, and we can expand on this data set over time as we find new categories or new types of data that we might want to do anomaly detection on.

A

So this is something that we're just getting started on. We're hoping it'll be useful to the newpick community as well and hoping you guys can contribute to it as well. um Please see ian danforth who actually stepped out for a couple hours but he'll be back again he's the main person putting this together for us or see me or any one of us.

A

If you're interested in kind of contributing to this okay and then last slide a couple of resources, jeff wrote a white paper, the science of anomaly detection, which is on our website, which covers a lot of the the topics I I spoke about. So you should definitely download and read that um there's the github repository again and here's a kind of a random thing if you're interested in other machine learning techniques for anomaly detection. uh This is a pretty nice survey of that.

A

It's a pretty big paper, but if you're interested in that, uh you know take a take a look at that. I think it's it's useful to read through it. Sometimes okay. Thank you.

A

G

I assumed I am just one question you can generate an sdr of the anomaly scoring columns. Is that correct.

A

uh You can uh you mean which columns led to the anomaly.

D

A

Yeah yeah, you can you can uh compute that that's, I think, done in cla model, dot, pi, actually, scott, I think moved it. There's a function in algorithms, anomaly.pi, or something that that that computes.

G

You can you can find that in it comes out of the tp yeah.

A

Exactly you can.

E

G

An sdr because we are working on using the sep stuff to do. Okay on these libya.

D

G

So we want to generate an sdr of the anomalous columns yep, give that to sept and find out what what word is causing the anomaly.

A

ah Okay: okay, okay,.

G

C

Any other questions.

D

When you run your models or when you run your rock systems, so it.

G

Learns anomalies.

A

Yeah, that's right! So the question is, you know we don't turn learning on off at all, so you know, wouldn't it learn the anomalies and it it does, and um you know so. If a particular pattern happens more than a couple of times, it's not going to be flagged as an anomaly, and so there's, actually you know in most cases, that's what you want. Like you saw in that step up function, you don't want to keep giving alarms continuously.

A

That's one of the biggest problems the it people have today with the existing anomaly stuff. Is they just get way too many alerts in those situations?

A

It'll give an alert when the anomaly happens and if that's the new, normal it'll kind of you know, it'll, learn that and and won't be an anomaly if you want there's actually um a mechanism in newpick to store the pattern that caused the anomaly similar to what uh fergal was just saying, and you can store that in a classifier and if you want to you can try to detect that again if it happens, so that's kind of a separate mechanism.

A

You know it might be that there are some specific situations that do occur. You know repeatedly and you do want to detect them specifically, so you can, you can store those in a classifier and detect those okay.

G