Numenta 2015 HTM Challenge, 4 Dec 2015

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Hard Drive Lifeguard - Manpreet Singh

Description

2015 HTM Challenge Application submission

A

More this one man tree couldn't be here so we'll watch this video, this hard drive lifeguard. We.

B

Want my name is monthly, then I'm, working on hard drive life car project for hackathon will give you a brief background on the project. While I work with developer community, I see that most of the quality time is spent on operational burden, while solving problems for both applications and infrastructure side, and I strongly believe that this can be solved using artificial intelligence techniques.

B

Traditional machine learning techniques required lot of label data to perform classification on the other end, new pic, which is which can perform online learning and perform predictions and anomaly detection is a much better choice here. To brief you on the architecture, data ingestion takes place through different collectors used on H HBase clusters. Smart attributes are collected into a common database from where it is fed to the algorithms to perform a normally detection.

B

I will take you through the data set and demo data set, consists of 64 attributes, SMART attributes, and this data is collected from one of the universities from multiple drives. Each line in data section contains data from one smart, read temporal in nature, with last column, being 1 or 0, which is a class attribute, defines failed or good drives.

B

If I go to life quadrunner, the first thing I do is I select the feature vector using z-score technique. I split the data into good and bad. Then new, good and bad models are created. The idea is when the new data comes in, it goes through the good model in the bad model and then how many scores are calculated based on the anomalies course. The classification is then done.

B

Fine, good and bad method feeds in sample data and tries to see what is the anomaly score run hard drive. I normally brats the graph to see whether the anomaly detection happens. Let me quickly run a demo to see that.

B

And here we see the data is being fed into the new pic models, and this is where I have inserted a normally data, where I am predicting failures and the model is able to predict the anomalies because it at earlier doesn't know about it while slowly, it starts learning the anomaly score drops down, and it should basically go to zero.

B

And that's exactly what is happening here as we see there's another small spike that I've added into the data to perform testing.

C

B

It again detects the anomaly score and, as expected, the anomaly like likelihood drops down.

B

The data that we printed for sample data can, we can be very clearly seen that it started learning after a while and the normally score becomes zero here. This is what I could achieve, and I would like to thank new mentor community for helping me out to achieve this and I hope from alpha and beta stage, I'm able to get this onto gamma and prod someday. Thank you very much.

A

So, as I was watching this demo video, it struck me that when he described how he was trying to do this classification that he was approaching it the right way train a model on all the good drives that never failed, train a model and all the bad drives that did fail and then train and then send the new data into each one and see which one has a higher score.

A

But the charting still confuses me: I, don't really understand what he's what he's charting there I tried to work it out with them on the post, and this could be worked out with probably a communication here, but but I'm still sort of confused about whether he seems to think he got good results, but from the charts I don't. I don't really see that. So that's my takeaway yeah.

D

I had this same same confusion as well, wasn't clear to me how it mapped to the first thing he was saying and.

E

D

It really working well or not. Is it really working.

F

And I didn't understand what the data was. Actually I couldn't well.

A

There was a there was a bunch of different fields of data that he got from some some public data source that ran a bunch of tests on hard drives and- and I don't exactly know what the fields were, but they're all scalar fields and some.

F

Of them failed after being building models to each one of those. No.

A

He's he trained one model on on all of the drives sequentially that failed and, I think, reset after the failure and he trained another model in all of the drives of the past yeah, but and.

F

Then try as he's trying to classify them based on their anomaly, saying what the temporal data was uh well.

A

It's these scalar values over time that are emitted from the drives as they're running I, don't know, actly what they represent in the drive.

F

I do have some indication that we need an open model for each.

A

Scale about that or I think he's using multiple fields in one model, but he had a lot so.

C

There's they're called smart: it's a profiling standard, I guess for hard drive, so I don't know much about it. Looking.

A

Up wikipedia it's not soft air, real yeah I believe I advise having to retry to pick one or a small handful of those values that were changing the most or at least most applicable to the problem when he did and look through it. You know with his own eyes and just create models with that small number of field yeah yeah.

F

I think in general you know these are kind of great kind of problems that, but often you spent you need to spend as much an engineering time verifying that you're getting valid results because it's often it's very hard sometimes to tell- and you need to really expect engineering time on that. So he might have done that by didn't. I couldn't tell that he one.

C

Thing I noticed is that the frequency of the data is two hours. It seems like for hard drives. You would want much more much tighter frequency or am I wrong, and now it's.

E

The cumulative errors- let's say during a two-hour period,.

G

E

So you know when something goes wrong: you'll get a burst of errors and that's reasonable yeah.

A

D

A

Familiar with the data source, I.

D

Think this is a nice application idea, because hard drives are really critical mission critical devices. Obviously- and you could imagine there- would be fluctuations in some measurements before a catastrophic failure and so to the extent that you can detect those fluctuations and I could imagine there would be temporal in nature, not spikes or anything like that. But if you could detect that before a hard drive failed, that could be pretty important greed. There's potential here, I think.

A

E

Place of right, interesting so familiar with it with what smart clicks. So some things are failure, rights and you, if you see discontinuity failure rates, then it's worth learning about other things, though, do have temporal drift like ambient temperature and in fact there you probably do want to use a hard threshold mm-hmm and not yeah.

D

Something the tracks. Yes, it's a combination, both yeah.

F

A

Okay, oh go to the mic. Please act fast question.

G

Can you comment on the strategy that was used for creating two models and comparing their anomaly scores, which assumed Lee? What was done as opposed to a single model.

A

Well, you don't necessarily need two models. You could just train one model on all the healthy drives and then classify them based on some anomaly threshold that that gets achieved. I.

F

Think I think in general you could build one model and- and if you built the model to did classification, so you'd have to have labeled data and we don't we don't. We've done this internally in us. There's not a lot of good tools. I think right now available to make it easy to other people to do that, so it's totally capable doing that the hmm algorithms are totally capable of doing that, but we haven't really expose it and well.

F

We found that, since we focus on anomaly detection found that people were raiding clever and they were using anomaly detection to classification like this and it seems to work pretty well. So is this really the beautiful way of doing it? No, does it work yeah and in like someone else lot of it and now a lot of people have done it so, but ultimately we can. We can do it, I, don't know we can do a better job.

F

We can do it with a single model and classification, and but it's just not as easy to do right now. This is easy to do. Yeah.

A

Agreed all right, thank you. I'm free.