Numenta Live Streams, 3 May 2019

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: NAB Python 3 Part 1

Description

Broadcasted live on Twitch -- Watch live at https://www.twitch.tv/rhyolight_

A

Look at Napa, alright,.

A

The new Mensa anomaly benchmark is Python to seven and we're gonna, create it in titan. We're gonna review some work that a colleague is done on this and.

A

Let me okay, so I want to bring up an email, but I don't want to show like all my emails on twitch. So let me just move this over.

A

And we've been working with Ian Danforth, who used to be a wearer or it used to be an employee of Numenta ages ago.

A

Alright, so here's the code, I'm gonna, be reviewing. Are they different.

A

This is all public at this point, so this is Ian. He gave me permission to show this so so he's helping us get a Python, basically nab the detectors and separated out into another repository so that we can yeah. So now so there's an AB detectors, repository.

A

So, if your so, the problem is with an a benchmark like this, you want to be able to run it in any environment, so I might so I might make an R solution for it. I might make a Python solution, a C++ pollution solution, Lisp I, don't know whatever elixir and you can't you basically to get your contribution or your detector into the system.

A

You have to write it to write out some files that have like your scores in them, or your anomaly indications essentially and then there's a scoring function that that will take those values and score them.

A

Let's do a quick review of this, so the immense anomaly benchmark contains data and scripts. This is a new benchmark in the difference between this and most machine learning. Benchmarks is this: is a time-based benchmark. It's an unsupervised, that's assuming you're unsupervised, so we're we're testing. How well learning algorithms can predict anomalies an unlabeled streaming sequential data, which is, of course, what HM is pretty good at so we wouldn't, and there were no benchmarks that we could compare ourselves with anyone else. So we created this so there's over 50, labeled, real-world and artificial time.

A

Series data sets and they're all scalar values over time, and so in this repository, are the tools allow you to easily run this benchmark on your using your own anomaly, detection, algorithms, and what we're gonna do is separate this, because this is written in Python, 2 and I. Think we made some assumptions about the PI. There would be a Python 2 runtime for running these detectors. So if you want to create an entry in the benchmark, you create a detector. I haven't done this, but but there there's a research paper on this here.

A

It is, if you want to take a look well, you've got the link got the link. This is I, have a I, don't have an AB I, don't have an AB button, but I put the link in here I'll copy it copy address.

A

This is the the paper that explains all the logic behind this benchmark.

A

Oh and here's the link to the NAB white paper, I, don't know what this is exactly. Is this this I thought that was the NAMM boy paper paper? No, no! This is different, so this is probably very similar, but I think it specifically tells you how NAB works. I don't want to get too too deep into understanding that I have a general under idea of how it works. But what I want to do is review the work that Yoon has done to split this up into two repositories.

A

So essentially we have a detectors repository and then we've got so this is sort of the new master. Read me what it would say. So, let's go through through this.

A

So it's basically the same type of prose I have entry points. Info is still this wiki.

A

Hopefully, this doesn't have to change.

A

And everything is, is posted to the scoreboard in the readme. So so we just keep a scoreboard in the readme if someone submits- and these are all of the different algorithms that are competing in this benchmark right now.

A

If someone submits a new entry, a new detector, then we essentially have to I mean they have to run it in their environment, create these files that represent the scores or represent the anomalies that they found in all of these datasets and then run nab, which has a scoring code that will score all of these and do its logic. So nabs got specific, scoring like for false. False positives can be treated differently than false negatives, so it has some specific rules about scoring so there's the standard profile at which conservative weights, positives and negatives they'll.

A

Do these all weight, false positives, false negatives differently and then make true negatives right right. The other true pop, true negatives, correct predictions of versus, missed, predict, missed anomalies, and so then there's like a lower reward for false positive and then a low reward reward low for false negative version of these, and this is the scoreboard.

A

So the scoring code is all going to be here in this repository.

A

So, okay, this this is the inspected update to the readme okay, so there's the main paper: okay, that's I'm, supervised the science of direct one main paper covering nab in the Mintos HTM based anomaly, detection algorithm. They have white paper and then the original publication. There's like how many publications here, one two three four got this one: this one, which is the wit, read me that points to these. Actually going to this evaluating real-time anomaly detection, this paper unsupervised. Okay, this is another. This is the same one. This is science direct yeah.

A

So this is this one.

A

Might just take this out.

A

Since it's already linked I should take a note, I should be taking notes.

A

A

It's almost done.

A

Extra paper link and there's a lot of papers linked here like I'm, confused, about which one also gonna say lots of papers which, which one should I read sort of doing. An overall review of this also.

A

Okay, we encourage you to publish your results on nab and share them with us. Please cite the following publication: I guess these sort of need this so I, guess that's why the extra paper like is there because, but maybe this would be better if it were I. Don't know, scoreboard caveats. Look at all these!

A

Please see the wiki section on contributing algorithms for us to consider adding your algorithm to the nav repo. It must meet the following criteria: open source work with streaming data processed data in real time. So that's the that's important thing in in real time. You can't batch it. It's like one bit of the data at a time and you get the next time step and then you give us an anomaly indication right then, next time step not only indication, it's uh so that's different than most deep learning systems.

A

You must be able to fully replicate your results. We we must be able to fully replicate your results. So that's where this this comes in.

A

A

Yeah I was looking at contributions.

A

We're back detectors and update agreement.

A

We're allowed to do multiple levels of time skills, so the outlook output will be delayed. By the longer time step you mean like different aggregations, but 10 kilohertz writing on a 10 Hertz signal, I.

A

Don't know I, don't know, that's a good question. Let's see.

A

We're streaming the unit algorithm must run online. Is data stream dim and not in batch? It's necessary I, don't think it matters how it does it so, but the criteria is you get one data point at a time if you decide to aggregate it, you'll have to make that decision on the fly.

A

So if you decide to what different finding the aggregation settings, usually that's really important when you're, especially dealing with sequential data, is as aggregating it but I think for this we're relying I mean we want them, basically want these algorithms essentially to have no prior knowledge, no domain.

A

Extensive live, and so if it decides to apply an aggregation and update its neck for its next print update its model for its next prediction. It's gonna have to go back and process and you know keep access to that data and reprocess it in a batch format. You know so it would be a pipeline that produces the answer after the pipeline processes. It well I guess you could say it could it can. It can go back and reapply a certain aggregation and relearn I.

A

Don't think, there's anything that keeps any of these algorithms from storing data and then and then running back through it as just to update its model as it goes along and I. Think that's essentially what LST M does it runs batches and then it adjusts in real-time and reruns the batches so that they can update its model in real time.

A

So if you rerun the batches using different aggregation settings, that's fair game.

A

Just it can't run in bash, that's the things necessary. The algorithms are computationally efficient to processing streaming data. Oh then, the following algorithms have been tested on NAB and do not meet this criteria. Oh wow I have a stand-up meeting almost forgot about I can't believe it's only 10 o'clock I have a stand-up meeting in 10 minutes, so bear with me.

A

I'm gonna get my my stuff set up.