Numenta NuPIC Tutorials, 21 Mar 2014

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Predicting Sine Waves with NuPIC

Description

This is a good tutorial for complete NuPIC newbies. It walks through setting up sine wave input data from scratch, swarming over the data to generate the best model parameters, and running the resulting model through NuPIC. Continues to show how to programmatically swarm against data and create a model based on the swarm result.

Source code: https://github.com/rhyolight/nupic.examples/blob/master/sine-prediction/sine_experiment.py

UPDATE: The interface for programmatic swarming has changed. Please see https://github.com/numenta/nupic/wiki/Running-Swarms#running-a-swarm-programmatically for updated instructions.

A

Hello new pic this is Matt Taylor I am coming to you with a tutorial on how to set up new pic to do predictions on a sine wave, so I'm going to start out by generating a data file with sine wave data, we're going to run a swarm against that data from the command line. And then we were going to run new pick through the OPF run.

A

Experiment command line script to see the new pic output, then we're going to learn how to programmatically, run swarms and run new pic as well just with a Python script.

A

So let's get started I'm pretty much gonna start from scratch here and all this code is online. I'll show you the URL soon, so the only thing I have in my directory right now is a new pic output file which we'll get to in a moment. But the first thing that I'm going to do is create a little script.

A

We'll call it generate data, and this is just going to write out sign data to a CSV file in the right format, for the OPF client to warm against it and then use it as input for you back so I know I'm going to need to import a couple of Python modules, CSV and math, so that I can do the sine.

A

Let's say we're going to create a thousand rows to start with of data and I'm going to create a function I'm just going to call it run for simplicity sake at this point, and we will generate our data here. The first thing we'll do is we'll create a file handle and open up a new file. It's just call it sign dot CSV. This will be our in the file we're creating for input for new pic. That is a writable I'm, going to create a CSV writer like this.

A

With the file handle there, we go and I need to write out a few header lines. So we'll do that writer right row and you give it an array I'm going to have two fields in this file angle and sign.

A

So we'll have an angle in radians and the sine value of that angle, so that is the first header. That's just the labels for the columns. I'll just copy this, so I'm going to write three headers out. So the first is the label. The second is the type which is float and the third is a flag which is going to be empty. So we use this in nupoc to specify special types of columns like a timestamp.

A

We won't be using those in this example, so I'm going to skip the explanation so for I in a range of the amount of rows I have. This is for every day a row that I want to produce I'm going to create an angle and what I'm going to do is I'm going to take a sample of like 100 samples per cycle in a sine wave.

A

So one Radian is or there's a two pi Radian and a cycle, so I can say for the in my integer count: math dot pi divided by 50, which will give me the Radian angle, so 50 samples per PI or 100 samples per two pi. So 100 samples per sign cycle, you sign the value is going to be just math dot, sine of that Radian angle and I'm, going to write that writer right row and the angle and these sign value. Okay, after that, it's just whoops.

A

File handle closed and that's pretty much it I do want to run this from the command line, so I will add the Liggett Ori, if name equals main.

A

This will allow me to run for the command line. I want to run the run function. So, let's see, if I did that right, Python generated data there we go now we have a data file at sign CSV, so that's pretty much exactly what I wanted. So we have a data file generator. We have a data, so the the next step would be to run a swarm over this input data. So I'm going to refer back to the wiki page on running swarms, there's also a YouTube video that describes how to run swarms.

A

The first thing I need to do is create a search definition, which is a JSON file and I'll use that when I call run swarm, so I've got kind of a can search definition here that I will plop down I'll just call it search depth on Jason and here are the fields. So what this means is in in my swarm, I want to include the values, only sign I'm.

A

Only swarming over the sine value, so I'm looking for a model with parameters that best predicts the sine which is a float and I've, got a min and Max value, negative, 1 and 1, so that will help bound the swarm.

A

The stream definition defines the input so I'm telling it the name of the stream and then the source is a file in this directory called sine notes, dot, CSV and that's the one we just created and the swarm will be able to use all the columns within that file as it's making doing its particle swarm. The inference, type I'm saying temporal anomaly, because I would like to include anomaly scores in my output.

A

If I didn't want to I would just call this multi step, but I said I wanted all my store scores, I'm, calling a temporal anomaly I'm only doing one prediction step. One step into the future. I could add more steps here. If I wanted, though we'll just start with one, the predicted field is sign, so that corresponds exactly with the the field name sign in my input data here: okay and the swarm sizes medium, which is usually your best bet large, you can put large small and medium here. Small is really just for debugging.

A

Medium is usually what you want to do large can take a long time, because it adds a lot of extra dimensions to the search. So it's a pretty simple search definition and go back to the wiki file. Here. All I need to do is call run swarm and give it the search def and tell it how many workers I want, which means how many threads so run swarm PI one thing I didn't check is nupoc in my Python path.

A

So if I run Python and the repple, you should be able to import new pic without an error. So that's good, so that works and if it goes in my Python path, I also have run swarm in my Python or in my my path. So I can say one swarm, search, def, I'm, going to say max.

A

Workers is eight because I have an eight-course system, and here it goes in generating experiment files, and this usually takes a little while, depending on how much data you have I, think we specified thousand rows and our generate data.

A

So it's going to swarm over all 1,000 rows multiple times, create a whole bunch of models with different model parameters and throw out the ones that are doing as well until it finds a model that does the best in the amount of times allocated for the swarm process, and once it's done with that, it will give us description. Pi, you can see in this wiki page.

A

You can read about what it creates: a description that PI permutations so we'll use a description, duck pie in the next step tutorial and then, when you will talk about the model programs after that, so swarming is a pretty CPU intensive process, so you can see here should be usually takes up quite a bit, but few resources- oh, but we're all done now, so it didn't take very long. So what happened during that swarm?

A

As you can see, there's a bunch of extra files that just show up here, I'm not going to talk about most of them, but we will talk about the description, PI and the stuff, that's in model 0, so back to the wiki page running models discovered by the swarm. So it's really easy just from the command line to run this model, that's a swarm uncovered.

A

If you look into model 0 you'll see, there's description, PI model parameter params, so this description, dot PI in here is really what's got the stuff that you need there and the model crams as well, which is recently added and I'll, show you how to use that in a minute. So what we need what we want to do one way we can do this. The way I'll do it first is to run it through in the OPF run, experiment.

A

So I'm going to do that Python and then I'm running the OPF, our experiment directly, I'm gonna, give it the path to the directory that this swarm just created and I should not call it. Python I always seem to do that. So this is now pushing that data using the best model parameters that the swarm found. It's now pushing that data into nupoc, so it made it through a thousand rows and if you look now in model zero inference you'll see a CSV file in there.

A

So, let's open that up and see if it looks like model zero inference default, whatever we're just gonna open that up in a spreadsheet, it's a little bigger okay. So what we have here, the interesting things are the sign. So this is the the ground truth essentially for sign. We've got some multi step prediction fields here, one is actual zero. This is not. This is just kind of aggregated data. The one I'm interested in is multi-step best predictions dot one, and that is one step ahead.

A

Best prediction for the one step ahead, so just to make this easy to plot I'm, just going to cut this and pin it right next to the sign, and we can just do a quick plot, so you can see what it looks like and there we go. So this is a thousand steps through and, as you can see, can't get at the very much detail here, but we will in a minute.

A

So that's what it looks like overall, after a thousand iterations through the data, we can, let's take a little bit of a closer look through just a couple hundred of the last rows. So at the very end, once it has learned a little bit more about the data it ends up, it seems to get a bit better, so it sticks to the line better. You still have some bumps, but there you go. That is new, pick predicting a sine wave there.

A

Okay, but I'm going to take this further. Don't care about the CSV file and the next thing we're gonna do is we're going to create a script that will automatically run a swarm? So, let's, let's do that. So the only thing the only script we have right now is this generate data, so I'm gonna blow away some of the stuff I'm gonna get rid of model. Zero start out, I see star dot pickle.

A

Let's see, I'm gonna remove this search. Def report I'm not interested in that stuff, so we're left with. We can also get rid of this description. Dot, PI, that'll that'll be regenerated by the swarm so and the permutations.

A

So we've got our search stuff. We've got our input file to sign dat, CSV I'm, going to create another script, just called X, mirror experiment PI, and what I'm going to do in this in this file? This would be my main script at this point as I want to I want to generate the data again like regenerate. It is a file this would. This would be the only script that I want to run so I want to generate the data.

A

I want to run a swarm and then I want to pull pull push that data through new pic and then do something with the output data. So let's get let's do this: let's create a function, we'll just call whoops def, run experiment for simplicity, sake, I'll just pass right now and I won't do any Leggett or E of name equals.

A

Oops main then run experiment. Okay, with that out of the way- and now we want to do something. Well, we can generate data, so here's our generate data function, I've already got it set up, so I gotta run it from the command line, but I can also import it and just run this function. So let's do that, let's put pass by in there, so import generate data and the first thing that I'll do and my my experiment is: generate data run okay, so this should give us a file called signed at CSV.

A

So let me just test this out to make sure it works. Real, quick, let's remove signups CSV, run experiment and then make sure that, yes, the signed up CSV is still there or it's been recreated after I've removed it okay. So that's the first thing we're generating the data we also want to create. Another function called swarm over the pass for now so I'm going to generate Madinah and I'm going to swarm over.

A

So the swarming you don't have to do from the command line. I recently added this section to the running swarms, wiki page. That shows you how you can programmatically run a swarm. You still have to create your search, def search definition that Chasen's file that I talked about, but let's try and do this programmatically, so I'm just going to copy and paste on this stuff from this wiki page.

A

So I know that I want that I'm going to import the permutations runner and we should be able to call run permutations on that permutations runner like so so. The search stuff is still fine. That's a relative path to the search definition that already exists, telling it aid, workers and override just means it will overwrite. Whatever files are currently there I've been asked to if it finds them. Okay, so now I'm generating the data and then swarming over the data. If I run this at this point, I should get there.

A

We got this swarm started so now, I'm regenerated all the data, and now we are swarming back over the data. I'm, not gonna. Let this finish, I'm just going to kill it, but you can see, let's see yeah, so it already started. Creating all these files that I was creating before that. Don't care for don't really care about at the moment, see in a model zero so after that at least so we're swarming over the data.

A

What this is one of the things that's going to get us and actually I, probably should let it run so I'm going to I'm gonna run this and let that swarm run because one of the things that its output by the swarm is a model params file, if you've ever manually, created a model frames or like. If you look at the Hatchin example, there is a model tramps file, you don't run it through a run up.

A

You have experiment, you write a little script, create your own model using model trams, and then you manually, send it and put into the model by calling the run function and get a result back you step-by-step, so the swarm process now returns a model trams file that you can use. So you do not have to use this okie affront experiment. Script from the command line, so their swarm is done. You look into model. Zero. You've got a lot of friends. Let me let's get on to that model. Zero model frags.

A

So this is simply a Python module and all it has is one value called model. Params and all of these this huge dick, then it came to me- means basically the swarm parameters that were found to be the best for the data input that you grant us one against. So this this basically Tunes all the parameters and gives you like it considers to be the best set of parameters for your model for that info. So that's great! There is so what we want to get access to that within our experiment, script.

A

So one of the things we can do it is important to another. Python module called show until and I'm just going to just a little happy that I'm just going to copy that file from the model. Zoo model frames time to so I'm, basically just moving it or copying it from in that models. Your directory into my current working directory, because then I can use it because I I want to create a model and in order to create a model, I need those model parameters.

A

Okay, so so now that I've after I've run this swarm over data function and I've moved that model parameters file that Python module into my working directory I can import it. So I'm importing the model parameters here and now. I have access to it. I need it, because I also need to create a model using the model Factory in the new pickup. Yes, client, so I'm also going to import that from new pick frameworks, opf model factory import factory.

A

So this is just a model Factory. In order to create a model, you want to use the model factory as a create function, and we want to send it the model grams that I just imported in the line of both it has this constant model params value.

A

So it's basically the the value that the swarm decided was best for this. This input data, and so there we have our model. We also need to tell the model to enable.

A

Inference on a predicted field, which is sine and again that corresponds with the field in the input data, the column sign.

A

Okay, so we have our model, we have our input data we've we've swarmed over it. We've used the model parameters from the swarm to create the model.

A

So now it's time, I think to actually read that input file and we're gonna need the CSB module again, because it's a CSV file and, let's say with we're, going to open that file sine dot CSV as a readable file and we'll call it sign, input, okay and we'll create a new CSV reader, the reader around side and put handle- and if you remember correctly, this CSV file has three header rows.

A

We don't want to pass those into new pics, so click under any way to get rid of those is just going to skip three rows and I'll make a comment here: headers and.

A

The real data is here: okay for row in CSB reader. That's how you read a line in CSB reader. We want to get the angle out. This is going to be the first field. It is a float and at this row, zero, and we want to get these sine value, which is also a flute.

A

It's going to be the second field in that line and to push this through the model we just say model run we're going to tell if we're sending you the sign- and this is the side value okay, so that will push one input into new pick into this model that we just created and it will return us a results. Okay, so at this point we could just print this result and we'd be able to see something at least so I think if I didn't make any typos. This should just work.

A

One thing I'm going to do before I do that is we've already generated the data, so I just skipped that part and we've already swarmed on the dais. Let's get that part, we already have a model parameters. That's really all that we need right. There's a model parameters: oh no I, never actually did that so I guess I probably should leave it in.

A

For now, we'll just have to sit for cool minutes, and at least we can test the whole thing out, Python experiment: here we go so it wrote out the thousand lines of data if it isn't running the swarm again, it should once the swarm is complete should copy that model. Params file from within the sub model zero subdirectory into our current working directory.

A

It should import it and then it should use those model programs to create a new model through the model Factory and then a read over the input file that contains all of the sign values and then line by line pass in the the line, the angle and the sign and get back a result, and there we go so, as you can see here. This is what a result object looks like it's, not very look, so it contains all the stuff.

A

The thing that's most interesting are the inferences, so you can get a lot of data out of this, but you can see how look back into this we're printing the results. It might be a little bit nicer to print in braces instead of other stuff, so I will vomit. This out see the inferences coming through to it.

A

So this is a one row around one result essentially of inferences, so it's got multi-step predictions, and so this is the one step ahead prediction and it's got a dict of floats two floats, which is a little bit complicated, but in case it's got the best prediction. First, it says it's got a 95, it's 95% sure the next value is going to be 0.03 840. It is.

A

Yeah I think: that's that's what it means.

A

But this is a little bit hard to decipher. So I showed you. There was this new click output file.

A

This little there's a couple classes in here: I'm actually going to use this to translate the the model results into something more readable. So I've got a little little superclass in a couple subclasses here. One of them is called nitpick file output and, let's just use that I won't go over this whole script, but the nupoc file output will essentially write this to a CSV file that you can open and take a look at.

A

So let's go back into our experiment file. Let's import from pic output import, a pic file output, that's the class we want, and after I've created my model, I'll just create an output object. You pick file output, it just needs a name, we'll just call it sign out.

A

Yeah just call it sign out and now, instead of just printing the inferences like this, which isn't very useful I'm, going to tell the output to write it so I'm going to give it the angle, the sine value and the result. Okay, but now that I've whoops now that I've got that.

A

Result now that I've got that I need to close it as well, because this is an I/o class okay. So after I've worked through all the input, I'm going to close the output file, so now I still got generate data and swarm commented out at this point, but it should be able to run this experiment and the nupoc output file output. Saying preparing the output to sign out, and it's done it says it wrote a thousand lines to sign out. So let's look at sign out here sign out and there we can go there.

A

We go. You can see angle, sign and prediction and we can take a look at this in our spreadsheet once again and then do a little plot. So the sign and prediction.

A

Well, let's chart this roughly, so it looks similar right so we've at least programmatically generated this instead of having to run through all the command line stuff, but I still would rather, this be a little bit more hands-off. If I just am trying to evaluate the the data coming out, I don't want to have to open a spreadsheet and plot it all the time.

A

So luckily, I've set up this new pic output to not only have a file output, but they put so instead of using the file output, so change it to plot, and you have to have matplotlib available for this to run but oops wrong command. But at least that's what we're already going now you can see a plot of the sign so as the values are being passed into new pic and the predictions are being passed out, they get immediately plotted onto the screen onto this little Mui.

A

So, as you can see, first off, you know hasn't seen a lot of data, yet the predictions are trailing behind, but but it's starting to kind of get the gist of the pattern, starting to align itself a little bit better and as it gets as it sees more and more and more data, it will get better and better. That's at it's prediction!

A

So there we go. We push. We just pushed through a thousand input lines and got, as you can see me pic got better as it went along so I'm going to now.

A

Want to generate more dia instead of a thousand rows I'm going to generate 3000 and in this experiment, file I'm going to have to regenerate the data and Reis warmer the data, so this worm will swarm over all of the data you give it unless you specify to stop at a certain row and then, when I see when I create the new pic plot. I can also tell it to show anomaly score.

A

So then we'll get an additional plot of the anomaly score. So I thought experiment, okay, so we're starting from scratch here, regenerating all of the input 3000 rows.

A

So, as you can see, we're now a lot further along the sign. Csv is much bigger, because I was three thousand rows worth of data and it is swarming now over all of that data trying to find the best model parameters possible with that data set, so I hope right back up, I think after this gets done, there we go so here it is after it's swarmed over 3,000 rows of data and I've updated the output to also show the anomaly score and another subplot down here at the bottom.

A

So the top is the actual and predicted sine values. The predicted is everything, that's the one, that's screaming behind a little bit. The bottom is the anomaly score from zero to one so should see as it as it sees more and more data less and less spikes in the anomalies for and better alignment between, the critical national values. So just gonna let this go on a little while, so you can kind of see their value and it's it's better more day. That's these or to do well on a sine wave.

A

Nuclear generally has to see a lot of different values, because it's not certainty because the pattern is is so regularly cyclic, it's not certain whether it's recognizing a sequence or a subsequence, a longer sequence, because all the sequences are so similar. So the anomalous for me is I, basically flatlined at this point. It's really comfortable with the data that is seeing the predictions are not perfect, but I've come to. It's been pretty well against this time with.

A

So that's pretty much everybody. That is how you can programmatically each swarm over set of input, data and kick off the good model after the swarm has been completed and and I hope. That has been helpful. So, okay.