Numenta 2014 Spring Hackathon Talks, 8 May 2014

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Beginner's Guide to NuPIC

Description

With your host, Scott Purdy.

You can find Scott's iPython Notebook here http://fer.io/~scott/nupic_overview.ipynb (json) or here http://nbviewer.ipython.org/github/numenta/nupic/blob/master/examples/NuPIC%20Walkthrough.ipynb (static html).

Another one here: https://github.com/numenta/nupic/blob/master/examples/NuPIC%20Walkthrough.ipynb

A

Cool so I'm not really going to be in slides, very long I'm going to be mostly in an AI Python notebook, showing the code I'm not going to really talk about theory at all, so there'll be some other talks. That'll be good for that.

A

But this is just going to be kind of a quick like get you going and a little bit of background to give you context for for what's going on so I'm going to talk about some different levels of abstraction in nupoc and that ranges from you know the lower-level pieces of the algorithm and things like the encoders, the spatial Pooler, the sequence memory, which is called the temporal cooler in the code, and this is Eli classifier, which I'm not really going to cover and then I'll talk briefly about the networks and regions API, but I'm not really going to go into that.

A

But it's just good to know about, and then I'll talk about some of the different things and what we call the OPF, which is sort of a framework for streaming data models and there's a few different entry points into that. So I'm going to go through and kind of the order, I mentioned them. So first there's encoders. If you don't know what encoders are all the algorithms work on this concept of sparse distributive representations, which is just ones and zeroes and encoders our way of taking different types of data and turning it into one zeroes.

A

So we have encoders for scalar values, dates and times categories, multi, encoders and and there's kind of one main thing that you have to be in mind when keep in mind when you create an encoder, and it's just that you want to encode the semantics of the data, so you want similar things that have overlapping bits.

A

A

Here this is going to pull up I'm going to put this online, so you guys can all take a look at it after when you see this question mark at the end, that's just ipython syntax for seeing documentation related to it. So it pulled it up down here and.

B

A

Don't know is that good, okay cool, so here, I'm gonna start with a scalar encoder, that's a pretty pretty simple encoder. It basically is going to take value and it's going to encode it in a range that we specify so here creating one from zero to 100.

A

Why it's two point: five and ninety seven point: five: instead of zero and 100 is just an implementation detail for this encoder, but I'm not going to go into, but it's going to create a set of ones and zeros that represents that number and the idea is that it buckets numbers over that range and then selects a set of ones. That's equal to the W. That's the width of the representation for that value, and then n is the total number of bits.

A

Registration so again, I'm not really going to go into the theory of STRs and what this means, but I'm just going to say this is how you get STRs from values. So here for the scalar, encoder I've, encoded, a few different values with encoder that I created, there's and then there's the in output. You can see for values that are close to each other that fall into the same bucket.

A

They're representations are going to be the same, so you can see that with three and four and then when we move into the next bucket, above that, with the value five, it shifts the values over and they're still overlapping bits, because the values are close to each other. It retains that the semantics of the data, but if you had a much larger value, then it would obviously not have overlap as those values move over. This is a really simple encoder. It's just doing bits right next to each other, and we have I'll show you.

A

We don't actually use this one currently, but I'll show you another one in a second, so next I'm going to take a value 100, that's at the very high end of the range and then another value, a thousand which is way outside of the range and so with this encoder, since it's just representing them between 0 100, its clipping the values that are above the range and just represent them the same as 100.

A

It's below the range of this name, so you can see you get the same representations, so we have another encoder, that's the random distributed scalar encoder, and the idea of that is that, instead of picking these values that slide across the range, it's randomly selecting the bits for each bucket and that way it can represent a much larger number of buckets. You run the risk of collisions, and so you, obviously, if you create enough buckets eventually you'll, have one that's exactly the same as a previous one.

A

But for the purposes of this, this demo I'm going to run this code. Here you can see you can see with 3 & 4. They have the same bit representation it's just like in the previous one and the 5 is different, but it's not adjacent bits anymore and then again with that one hundred and a thousand a thousand is has a new representation. It's not the same as 100, even though it's outside the range. So that's the random, distribute random distributed. Scalar encoder.

A

I'm going to flip through a few few more things here, so here's the date encoder a question.

C

D

The first example 3 & 4, have.

E

E

C

F

And three and four are the same: yes, yeah.

A

Yeah I'm not doing a good job of explaining this ah so.

A

Okay, so here's the takeaway from this. What I'm trying to say what I'm trying to show here is how to use these these different coders and just give you a flavor of the different encoders that ship with new pick so with the scaler encoder you're you're, taking scalar values and turn them into ones and zeroes, and the way that it does this is that buckets the values across that range and then turns them into ones and zeroes and within a bucket you're, not going to get a different representation.

A

So I think that's what you're commenting on the three and the four the same representation? Yes and and that's because the obviously we can't represent everything uniquely with this encoder, because it has a limited output. And so you have to pick the right end. W min Max values, to make sure that your granularity makes sense for your data, and this kind of should be a benefit in some cases. Because aidid noise is the data a little bit. So so it doesn't have to learn that four and four point: zero zero one or the same thing.

A

C

I just recently understood this, so if you change in here is twenty two, so there's only twenty two slots to have a one or zero.

C

If you change that to a hundred and you change your window, the w211 would be the first bit on to would be the second bit on three would be the third bit on if you changed it to 100 and made the window to be ten anything between one and ten would be the first bid on anything, but in the twenty range would be the second bit on so you're just bucketing in encoding, scalar values to fit within some of those buckets. How you define the range in the window.

A

A

Cool any other questions about that.

A

Okay and then, when I was kind of butchering the random distributed scaler encoder, it doesn't have the sequential bits, but it's doing the exact same thing. So until you fall into buckets outside the range.

G

Sorry, the word random is confusing. With the random distributed, scalar encoder will I get the exact same representation each time. I see a three.

A

Yes, uhm yeah, so I agree. The naming is confusing on that. We we weren't really sure how exactly to name it so the random, just the Ram Dass aspect of it is determining which bucket maps to which bits, but once that's done it's fixed. So.

A

So in the first example, with the regular scalar, encoder is going to be about 20 because for N and W the number of possible buckets you can have is that with a kind of 23 and a W of 3 is 23 minus 3 plus 1, which is 20 buckets.

A

That does that make sense, because basically that's a a bucket which is represented with these three values, and then you shift it over one to get the next bucket. So you can fit 20 on this range and the formula for that is n, minus W, plus one with this example, because we're randomly picking the bits to represent a bucket, you can theoretically create any as many as you want.

A

You keep going and the the problem that you run into is collisions with values here: I'm showing very small numbers, so an N of 21 and the W of 3 is not a very good representation for in most cases, because you're going to have a higher risk of collisions. Since you only have 21 bits and it's not going to deal with noise or collisions very well, because you only have three active bits. If you.

A

Not done your repeat this, yes, that, okay, so the questions hold.

G

H

So in this example, because the ones and zeros can be anywhere along the the range.

H

Why should three and four be the same? Okay,.

A

That's a great question so.

A

This is a scaler encoder, it's not just integer, so it do it. You can get a floating value as well a floating point value as well. So you don't want it to represent every single thing. It sees, uniquely because things that are different by you know one millionth of a like a point: zero, zero, zero one. You want that to be treated the same in most cases, but it's different by data set.

A

So you have some flexibility over that I'm setting the resolution, which tells it how big to make a bucket and then an offset as well. So it's basically going to make each bucket five wide.

I

A

Actually I can I can do that. We can see.

A

Yeah so here I think they're getting so yeah. They have two three and four have to share it in them. One different because it's the next pocket and then five drops another bit and adds another. So it shared five years, two with two bits with four and one with three so.

B

That means the window is just the number of bits that are on for any given number, not the size of the buckets, because that's what the resolution is. W.

A

Yeah yeah, it's the number of active bits.

A

Okay, so those are scalar encoders I'm, going to show you two other kinds: one is the date encoder and the date encoder it encodes and differently. You have you have options for how you want to encode a date like a go through the documentation. So in the example that I have here, I'm going to I'm going to encode a date as I'm going to code the season of of the date when you create a date encoder, you can tell it what you want to use from the date ten codes.

A

You can choose season, you can choose day of the week. You can choose weekend, vs weekday, which is just a boolean, and then it's going to capture that aspect of the date in the output SDR. So in the example here with season the the value five is essentially the W that we had in the scalar encoder, but specifically encoding this season, so I'm creating three different date times and encoding them.

A

One is right now, roughly I, guess it's a little bit later than that now, but I was guessing what time I get to this part of the talk and then next month, and then I just picked another time Christmas and what you can see in the output here is for each of these encodings. You have five active bits because I specified that in the in the construction of that encoder and it's capturing the seasonality, so it's selecting the total number of bits to represent the whole year.

A

So since I select said I wanted five active bits it wants. It I think it's.

A

I'm not actually sure how it's picking the total number of bits, basically I, guess making 12 buckets roughly, but you can see if I take now versus next month. There's some overlap between them, the these three bits and some difference between them. Where, when I encode christmas, you get the three at the end here and the two at the beginning, and that has no overlap with now and next month, and so that's how it captures the semantics of seasonality, which is that you know it's doing it roughly.

A

That's how you capture the seasonality of it so days that are in the same season should have some overlap and if you fall like on the opposite end of the year, you're not gonna have any overlap.

A

So that's the de encoder, like I, said, there's a number of different options for that. Like weekend vs. weekday day of the week, things like that I'm just doing season here, but you can, you can do whatever makes sense for your data and you might ask how do you know what works well for your data I'll talk about that towards the end with swarming, which is a way of selecting parameters for your data set.

A

Okay, one more encoder before I move on is the category encoder, and that is a similar idea, I'm going to just pick four different values as categories and and specify them in the constructor to this category, encoder I'm, going to say W of 1 or of a 3 which again means three active bits and then I'm going to code, each of them and print them out.

A

So basically, what this is doing is it's just it's picking, 3 bits for each of these categories and having no overlap at all, because their categories there's no way for you to for it to not know any semantic similarity between them.

A

Then you'll notice there's also these first 3 bits that aren't active for any of the 4 cap categories that I specified. You might think. That's if there's a none value, but it's actually, if you get something, that's not none, but also not one of your categories. So so that's a category encoder and I'm going to actually use these 4 SDRs in a second to show how the spatial pulley works. So that's encoders I'm going to move on to the spatial Pooler. Unless anyone has questions about.

J

For the category encoder, it sounds like you can't store as because it's kind of a design constraint. You can't you don't want overlap. So can you not do as many categories with a given set of bits that, as you can do other things, yeah.

A

Yes, it's a good question, so you're saying with the same number of bits you can grab.

J

A

We really know how to answer this so I guess you can represent a much larger number of integers when you're bucketing them, because you can have the same number of buckets as categories here after you have more buckets because you have overlap between them as you. So yes, just yeah, and it also brings up a good point.

A

Do you need to have totally exclusive bits for a category encoder and you don't need to you- could do something similar to the random distributed scaler encoder, where you randomly pick the bits and then just hope you don't have collisions very often and if you picked a large enough and then W, like maybe three four hundred bits, total and thirty or forty active. That probably would work just fine and.

A

We actually have in the category category encoder just like that.

A

Cool, so next is the spatial polar again I'm, not gonna, really talk theory here, but the spatial polar takes ones and zeros as input and outputs, a sparse set of ones and zeros that hopefully captures the spatial invariance of the input pattern. So it's going to adjust itself over time to Matt to represent the input space. Well, so it's sort of a transform from all possible input inputs into the the sort of most common groupings.

A

That makes sense, and actually there was a paper that I was going to put a link to in here, but I forgot. Someone had put a paper up on the mailing list, comparing the spatial Pooler to clustering, algorithms, I thought that was an interesting use of it, an interesting comparison, but so again there's a documentation. So if you guys pull this up, you can you can go through this and run these examples and I just put the this line in. So it's easy to get to the documentation.

A

If you want to look at that or if I needed to look at it, based on the question from you guys, don't know why I have all these okay. So what I'm going to do for to show how to use the spatial polar is I'm going to have. It learn the four categories that we created with the category encoder, so first I'm, just printing out the length of the output from the category encoder for cat.

A

So it's it's 15 bits long and we're going to use that when we can construct the spatial blur to tell it how many input, how big the input is, input dimensions that can be multi. It can be multi-dimensional. So, if you're doing a vision problem, you can do two-dimensional data here, I'm just doing one dimension, just to show how it works. I'm, going to give it four columns and the reason I'm going to do.

A

That is because we have four categories and I'm going to try to have the spatial Pooler learn to represent those four categories. Well, exactly learn those four categories in a in a real situation. You probably want a lot bigger numbers in these and you don't necessarily need you wouldn't want. You would not want a one-to-one map in between columns and and categories as I'm doing in this example, but I'll talk about that more after right. I show this so there's a few other parameters here.

A

Potential radius says of all the input bits how for a given column, how many of them can it potentially connect to, and in this case I'm going to allow them all to connect to any of the input, bits and I'm setting global inhibition to true I?

A

Don't really know how to explain this without getting into a lot of detail, but let's say in a vision problem. We were doing a 2d topology, so you have a two dimensional input and what you can do. In that case, you can lay out the columns in a similar way, so that the column in the upper left corner only can only connect to the input bits in the upper left corner and that way different columns can learn different local features.

A

And then, if you had tempo pooling, you could use a hierarchy and learn higher level things as you move up things like that, in this case, I'm just going to do totally flat space. All the columns can see all the input bits and the inhibition is global, so, rather than columns inhibiting look other columns next to them next yeah.

A

Next to them, they inhibit all of the columns, which means that out of the total pool of columns, we're going to limit the number that can be active globally to one in this case, because I'm setting that year, another couple parameters for increments and then I'm going to just create a.

A

Numpy array and that's going to capture the output so when you're using the spatial port by itself, the way that it works is you call normally.

A

You get and I just realized. Oh I, see okay, I lost track of where I was before I feed, any data into the spatial pool or I'm just going to print out what columns or what input bits each of the columns is are connected to and that's randomly initialized. So the each of these vectors is each of these bits match the input bits that come in and a one means that the column was is currently connected to that input bit, which will make more sense in a second.

A

But that point of showing this is that they're, random initially and when I feed data in those are going to adjust. So some of the columns, like one of the columns, is going to learn to represent one of the categories, another one's going to learn to represent another one and you'll see what that looks like in a second. So here I'm going to feed in our category cat and look at the output from the spatial poor, so SP compete. The compute method is how you do.

A

That is how you feed data into the spatial blur and that's going to put the active columns into this active array and the way it selects. The active columns is by having each of the columns look at it's connected bits and some the ones that are active in the input and then they compete.

A

So the ones that have the most active in their connected pool will win and then, with learning set to true it'll increase the connected weight, the weight of the connection to that input bit for all the ones that are active and for all the ones that are not active. It will decrement them and that the implication of that is over time. The inputs that a column is active for it'll learn to represent those better and so where we start out with random connections. Here after I feed in.

A

The value for cat 20 times, if I print these out again you'll, see that this third column, which was the one that was active for it, which is this one here- has adjusted it's connected weights so that, rather than being connected to a random set of these input, bits it's now connected to the ones for the input that it's active for most commonly. So again, when we look at these random connections and we we feed the value for cat in the active bits and cat are these three.

A

And so all these columns competed to become active, and this third one was the one that happened to win and that's because it had connections to all three of those bits. You can see. The second one did too so there's a little bit of randomness there, but the third one won, and so then I when I fed cat in a whole bunch of times it would adjust its weights.

A

These three connections would get incremented, so they would stay connected and these other ones that are active got decremented.

A

So when I printed the weights out again after feeding cat and a whole bunch of times, those the ones that were active that are not in the input got decremented are no longer connected, I hope that makes sense, I'm kind of not explaining that well, okay, so the next thing I'm going to do is I'm going to feed in all four the categories that we created and I'm going to do that 20 times, so that the weights have a chance to adjust and then we'll look at the output or we look at what input bits that each of the four columns are connected to so I'm going to run that and then I'm going to go.

A

Do this loop to print out each of the columns again now you can see that the columns have adjusted their weights so rather than the random connections that we had up here. The third one like we saw before is connected to all the bits for cat this. Fourth, one is connected to all the input bits for dog, and then we see something interesting here.

A

We have a column that represents monkey perfectly. It has the three monkey bits on and none of the others we have one that also partly represents that and then the column that represents cat started to also represent the slowloris I.

A

Guess it's not obvious which of these is which animal but I just happen to remember from before. So why is that? Well again, these columns are competing for these values, and so what you can have happen is one value like or one column may become active for it may win from multiple animals, and then it's going to be incrementing its weights for the bits for for monkey or for cat when it's active for cat and decrementing the other ones.

A

But then, when slowloris comes- and it happens to win for that one, it's going to increment the value, the connections to the slow, loris bits and decrement those for cat. So it just happened that this one learned to represent two different things. Even though there was another column that maybe could of- and we can keep running this a helmet or we can keep running this a whole bunch and see if they end up even it out.

A

But it looks like this is: maybe the steady state for this I don't think that's going to change so I was I was expecting that each column would represent one of them perfectly, but it turns out that that's not what happened in this, but.

A

Yeah, there's always going to be some randomness I was trying to explain how that could happen. Where one happens to win for two different animals, and then you have another column that doesn't really represent anything or maybe partly represents one of the animals, and what you'll see in a real system is generally I was trying to say this. Normally, you won't want to try to do this one-to-one mapping. You just have a big pool of columns in a big pool of inputs, and some columns will learn to represent specific parts of an animal.

A

So this is really simple data, but you have may have one column that represents the first two bits for an animal and another one that represents the third in in real cases. You'll allow multiple columns to be active at the same time, so you can get these much more complex representations where you have some of these features in some of these features.

C

Hold on hold on.

D

Would you have better results if you don't inhibit globally with.

A

A specific example: yeah, that's a good point. So in this example, probably because we could have I mean if we knew that this is what we're doing, we could set it up so that the columns could only potentially connect to three bits like we could have five columns say that their radius is three bits and not have global inhibition. Okay, but.

D

That's supposing, as we know, all the categories from the beginning and when.

A

Yeah yeah, so we have know that in this example and I, don't know if that's necessarily a property of global inhibition topology here like taking an advantage of the topology, would be useful because we know how the bits are laid out, but in more complex cases where the input is not laid out evenly. That might not help in the in a vision. Example where you have two D input is very obvious and very intuitive that you would lay the columns out to D as well and have a local field that it can look at.

A

In other cases like in our product rock, where we have streaming data in a timestamp, you have multiple fields, that's a little more confusing how topology you would help. So we use global inhibition normally.

A

But you don't thank you cool, so I'm not going to go too much into the properties of this facial pool or more I, mainly just trying to show how to use these different pieces, but I want to show one property, which is the spatial invariance, that it learns so I'm going to create a new SDR. That looks like cat, but is a little bit different, so I'm taking one of the bits 3, which is counting by 0.

A

So it's the fourth bit and remember cat is the fourth fifth and six bits so I'm going to take the first of those and and oops.

A

I'm going to take the first two of those and set them to one, so it represents most of cat and then I'm going to pick another bit that isn't part of cat. That happens to be part of dog. So let me print that out real quick. So normally, if I was, if I had an STR for cat like if I used the category encoder, if I cat in I get these three bits as ones and all the rest zeros but I'm just artificially creating an STR, that's a little bit different.

A

It mostly looks like cat, but it has one bit from dog now. If I feed this into the spatial Pooler, what should happen? Well, the columns are going to be competing to win out. So we can look at our connected synapses here and we can see this third column is going to match two out of the three bits and the fourth column is going to match one. So we I expect that the third one will will win.

A

It will have the most overlap and so it'll win and it will basically say it'll it'll see that input patter. The output pattern will be the same as if we had fed cat in so this is essentially what the spatial ploy is doing. It takes noisy input and outputs a stable representation of an invariant representation of it. So the output is going to look like this. Just basically there'd be the the four columns and which ones active.

A

So when we did cat before it was this third column that was active because that's the one that's connected most to it, and we expect that we'll get the same thing here. So let's run this and yeah. We do see that so the third column, which is the one that represents cat, is the one that's active. Does that make sense? I want to get that.

A

Yeah I can repeat it: man.

K

Since cats pattern has six bits of ones, those three from the other category- that it matched I forgot what the animal was. It was something unfamiliar to me. Delores Laura,.

A

K

Cat pattern matched both the cat and the loris so when it tried to match the noisy cat, oh two out of six bits, only two out of six bits matches. How does that compare with the dog pattern which matched one out of three bits? Yeah so.

A

It's going to look at the some of the it's connected bits that are active and that's what it uses when it competes with the other cells, so that the biological biological analogy would be the neurons. So it's going to sum the inputs on a segment of a dendrite segment and it's going to get it's going to either fire or not, and whoever fires first can inhibit its neighbors. So even if you have way more segments and synapses on you, it doesn't matter those those ones, don't matter just the ones that are active. That matter.

A

And again, it's going to when cat is active, it's going to strengthen its connections, to the input bits that are active and weaken the connections to the ones that are not active. So it's going to weaken its connection to the slow loris in that iteration. But then, if it's useful, Louis again the future and becomes active, it'll strengthen it and generally the amount that we increment. The connections is higher than the amount that we decrement it and that usually works out. Well.

A

Okay, so that's all I have for the spatial polar I'm not really going to go into too many of the properties of it. There will be some other talks on I. Don't know if there'll be a talk on spatial polar theory, but there's plenty of information online videos and by talks by Jeff and other resources. So any questions about using the spatial polar.

E

Sorry like when you set the learn equal to false, isn't that where it increments and decrements all right, oh.

A

E

So it's all straight.

A

uh Yes, okay, so with learning set default, it's not going to change the weights. Yeah I, didn't I, didn't realize. I had set that to false are a good point. So you can. You can enable and disable learning as you want. Generally you leave. It enabled I'm not sure why I decided to turn it to false here, because.

C

A

Yes, sure, okay, so the last individual piece I'm going to talk about before I talk about some of the higher-level ways to interface of the code. Is that the sequence memory which is called the temple pool in the code and the temporal pooling is where or that the sequence memory is where it learns the patterns in the data over time?

A

Generally, you take the output from the spatial Pooler and feed it into the temporal Pooler. You can also use the temporal Pooler by itself in chayton's linguists example earlier, that's using the I believe it's using the temporal or just by itself, and it's feeding the word SDRs from from steps API directly into the temporal and then it's looking at which columns are active and its mapping the columns one to one with the the bits from the word STR, so it uses which columns are predicted to know what word is predicted.

A

It uses those bits and feeds it back to St the cept API. So this is an example: I pulled from the hello, hello, TP script in the code, that's just a simple demo that I think stupit I created, and so this is in new pick in examples.

A

Tp hello, underscore TP, so you can find it they're, probably going to walk through a real quick here, I'm, creating an instance of the TP I'm not going to go through the different parameters, but you can say how many columns you want. You know how fast to learn by like how how much you increase the permanence is of the connections different things like that.

A

Once you have an instance of the TP, you need some data, so I'm, just creating this demo just creates fake data to feed in and then to send the data in you call the compute function and that's actually that's the same as SP. In this case, you feed it. The array that you want it the input array. Again you can enable or disable learning and then.

A

That's not important so I'm going to feed these valleys arrays in and I'm, not getting the output here or doing anything with it. Yet I'm, just learning the sequence initially and then I'm using a reset in between each of these loops and what that does is yes, ma'am I'm.

C

Wondering about the input to the temporal polar is the output of the compute function from the spatial polar directly, the input for the compute function of the temporal polar? If.

A

You're using the CLA model or the OPF client, you have a spatial floor and a tempo cooler in your network and yes in this example, I'm creating artificial data to feed into the temple pool or I'm, not using the spatial ploy that I created before I'm, not using output from that right. um Yes, I answer. Yes,.

C

A

Ok, so I'm going to run this the reset the reset here allows you to learn a sequence and then, if you have multiple sequences- and you don't want to learn over that boundary- you can put a reset in between and that stops it from learning a transition from the end of one sequence to the start of another.

A

So later on, when I do the CLA model I'm going to use the hot gym example and in some of the hot gym datasets it's just one of the data sets ignore the name, but it's a data set of electricity usage in gyms and they have different locations, and so, if were training, a model on that data, you'd want to put a reset and be in between the two points from one Jim and then the sequence from the next. So you're not like learning over that boundary.

A

Okay, so now I'm going to send the same sequences in and we're going to look at the predictions made by the temporal puller and they should be accurate now because I don't have any noise in this data. It's very clean, simple, learn and I fed it in a whole bunch of times already. So it should have had time to learn the sequences and then there's this function that just prints out the STRs that are predicted and and the ones that I'm feeding in. So you can see what it looks like.

A

A

If you're doing this here yourself, you can see here, there's this TP print States. So when you call it the compute function, this you pass the Ray in and then, if you want to get the predicted cells, then you can get them with the get predicted state here and then I'm just going to like format those in a way that makes it easy to read them so I'm going to run this, and the output of this is it's basically going to for each input vector it's going to print it out.

A

So you can see that here and then it feeds it in alright, it looks at the predicted state what it's predicting is going to happen next after feeding that record end and I'm printing that out and then we can go to the next input and see that so basically, what we're seeing here is I feed this raw vector in the active State inside the temporal Pooler matches the input, because those columns become active.

A

It makes a prediction based on the connections of the cells and what it's learned in the past and predicts that this next set of bits are going to be accurate. This next set of cells are gonna, be active, and then we feed the next record in and since it's already learned this sequence, it is correct, and so then we see.

A

That those columns are active and then it predicts the next value. So this all matches all that you guys can look in at this later for more details. One one important part of this is that in the temper puller, it's learning these sequences within context. So when we created this temporal Pooler, we said: we've picked the number of cells per column and set that to two and the cells inside of a column, all represent the same spatial pattern, but in different contexts.

A

So when we see the output of this, the top row row is, is one cell and the bottom row is another cell. So this is one cell in the first column- and this is the second cell in the first column, they're just two cells, and so when we see a value like this B, this is in context so because it was predicted as coming from A to B.

A

These cells not only represent the spatial pattern of B, but it represents it in the context of coming from a from this previous sequence and.

A

A

So if you were, you can go back and look at this example and try putting letters in in different orders and see how you'll get the different for given spatial value, you always get the same. Columns active but you'll get different cells depending on the context.

A

Okay, so that's pretty much it for using the temporal puller again, you can look at the example here in the parameters. The parameters aren't always fully explained so feel free to grab me if you're trying to use this- and you don't understand what what the parameters are. The best way to start with this is to find an existing example with data. That's similar yours and then just start with those parameters. I guess I should go too.

A

And then, once you have your temporal or you call the compute function, pass the array in to get a predicted cells. You get you call the get predicted columns or get predicted state to get the predicted cells out of the Templar.

C

A

I

C

That's really useful if you end up working with cept word STRs and you want to get the next predicted word. You'll call the tepees get predicted state and turn that into a word, STR bitmap, to send to cept a P I and ask it. What is the closest word to this STR.

A

Yeah- and these are really simple examples with really small numbers, so it's easy to visualize, like two cells per column, is pretty small. Typically in the examples you'll see we use 32, which is probably more than we normally need, but it's big enough that we never have problems with it and the linguist example is- or the fluent example is probably a good place to look for a more complex use of the Templar rather than a simple dummy version like this.

A

Any other questions about the temple for.

A

A

I'm going to switch back to this, so I've covered the algorithms that don't worry. The rest of this is going to go, hopefully a little bit faster, there's not too much more, but I'm not going to show examples of using the networks and regions API, but I kind of wanted to just mention it to to give you an idea of what it is.

A

What I'm trying to capture in this slide is that there's different levels that you can access the code at so all the stuff I just just showed, was using the algorithm. Implementation is directly just instantiating, the SP and stanching the TP and feeding data in and getting the data out. There's some ways to use those that I'll talk about a little bit that are a little bit higher level where you can just sort of specify the parameters and what your data is and then just run it, and that can be really useful.

A

But it's sometimes a little hard to understand. What's going on so the network's regions, API is sort of intended for arbitrary topologies of the different pieces. So you may want to have well here. I can show you an example, so here's an example of how you may set up a topology. You may have two different inputs, audio and images, and so you'd want at the bottom level.

A

You want the encoders for those, so you create some encoder for audio some encoder for an image, and then you have the output of those encoders go into a spatial Pooler to learn the spatial and variances and then output.

A

It back go into a temporal cooler to learn the sequences over time, and then you can have the outputs of those temporal pooler's combined into a single spatial polar and then you can put another temporal on top of that to learn transitions over that higher higher level output that you're going to get from the spatial poor of the combined inputs. And then you can put a classifier at the top to.

A

Say what's happening in the images or an audio or whatever your problem is, but what the networks and regions API is really nice for. Is that you, once you set this up, you feed the data in to it and it propagates it through all the different pieces, and then you can get that put so you don't have to manually, create all these different things pieces feed it. Your audio into audio encoder, take the output of that feed into the spatial or take that part of that feed, Interpol or SATA, etc.

A

The network you just call, I think, the run method on it and it does the whole thing it also formalizes serialization. So each of the regions can decide how it wants to serialize itself, but then the network will call into each of those functions to serialize them and store them with a file that knows the topology of the network.

A

So it's it's pretty useful for experimenting with this type of stuff and, like I, said I'm not going to show examples of that, but what I am going to show next is the CLA model, which is part of the online prediction framework, and that has a network inside of it, and that network is configurable.

A

So you can say whether or not you want a temporal Pooler, for instance, in in your CLA model and it'll create the region for that in the network for you, and so when you feed I'll, show you so when I show you feeding data into the CLA model in a little bit that in turn feeds it into a network internally.

A

I mentioned a little bit what, when we say, all nine production framework or opf, it's kind of a set of tools for running streaming, datasets as sort of a pluggable architecture for different types of models and different metrics for evaluating models or comparing them and then some related tools.

A

So one of the pieces is the CLA model, which is generally it's kind of an encapsulation of all the pieces I showed before so, rather than creating those yourselves or using the networks and regions API. The CLA model kind of encapsulate the most common case that we have, which is a single level of encoder spatial pool or optional temporal Pooler, and then a classifier which turns the output of the temporal cooler that predicted cells into a predicted value.

A

I also didn't talk about the CLA classifier, or show examples of that, but maybe I'll send out a follow-up with information on that. Okay, so, let's run through this example, real quick. This is again just copying an example in new pic. So you can. You can find that in here and it's doing mostly the same thing as that file. So the way that you create a model is with the model Factory and you pass in prams.

A

Here we haven't defined it yet so I'm going to show you what that looks like I'm not going to go through all these, but the model tells it what type of model that it once again the OPF. It's got this pluggable model system, so it knows that CLA maps to the CLA model, there's some other information. We're not really doing that.

A

You can sports aggregation, so here we're aggregating hourly the model prams or would actually go to the model, so we're using a silly model. So it has some different parameters, one of which is the inference type in this case, I'm going to do temporal multi-step there's also a non temporal multi-step, and that is, if you don't want to use the temporal polar and that sometimes works better.

A

If you have data that can be predicted on a with just a first order model: okay, so I'm not going to go through all these parameters, but I wanted to put them in here. Just you could see what they were. These control things like how fast the model learns for the encoders. What and W to use things like that.

A

Here this isn't.

A

Some of the tools in new pic make it easy to get datasets in experiments. So there's this function find data set, and that has a few different. Like few different rules, you can set an environment variable to have your own directory that you have your data sets in and then you can use this function just to get the the full path for a file based on the different rules.

A

So first I just wanted to show what this data looks like so I'm, just opening it up and printing out the first few lines. This is the a file format that our tooling understands so there's a file reader that understands this format and it has three header lines and then the data and it's a CSV format. But it's going to take these first lines and use it to understand the data. So the first one is just the names of the fields.

A

The second one is the type, and that makes it easier to work with date times and and and floats and things like that and then there's a few other options here.

A

This s means that it's a sequence so I mentioned earlier how in the hot Jim data set there's different gems. So this first field is, is the the Jim that it belongs to and this sequence flag or when you put an S in here, it's the sequence flag and that's basically going to insert or reset when that value changes. So when we get from the fall, whatever I don't know, but I'm get from that gym to the next one and I'll insert it's a reset, so it doesn't learn over that boundary.

A

But then we have an address a date time and then the scalar value and we're not going to use all these fields in our model, we're just going to use the day time and the value, and that was controlled by by our encoders, which are set up here. So here we're doing the time of day and the consumption, which is the value that we want to predict.

A

Cool so here I'm using another another piece of the tool Jess, which is the file record stream. It understands this that that file format. So when we open it and print out the data, it's already stripped off those first three lines and it interprets the data as the appropriate type. So it rather than a CSV reader, which we get this as a string.

A

It understands that it's a float and it parses the date/time, and this isn't really that important, but I want to include a few of the pieces so that, when you're working with this, you don't have to reinvent the wheel or, if you're, working with our datasets or see this stuff you'll understand what it's doing so now that we have our model parameters, we can create the model again and then, when you're doing prediction, you have to tell it tell the model itself what field we want to predict.

A

So we're going to tell here that we want to predict consumption once we have that model. We can feed data into it and I'm going to feed 100 records into this and I'm going to print out the input, which is the consumption field in the in the record that you feed in and then I'm also going to print out the inferences that I get out of the model so before I. Do that just really quickly.

A

The way that you feed data is you call the run method and it takes a dictionary where the keys are the names of the fields and then the values are values. So that's what I'm doing up here when I I get the data from the file reader?

A

It's just giving me the values and so I can get the I think there might be a helper function to do this, but I'm kind of zipping it together myself, where I get the field names and the data, and that turns it into a dictionary I'm not going to show what that looks like.

A

If I, just if I run this really quick, this is what the the full record looks like, but again we're only specifying encoders for two of these fields, which is the day time and the consumption so the address and the gym aren't actually going to be used in the model.

A

Okay, so I'm going to feed a bunch of records into this and print the results. So you guys can see what that looks like so here, I fed in value 5.3. Initially the model doesn't know anything about the data. So I doesn't know what to predict. It just predicts the value it just saw, but if we run this enough times, the model will start to learn the data and get better at predicting it and.

A

A

I'm just going to do this a few more times it looks like yeah. Now it looks like it's got, never run it through, maybe six times the hundred records, and you can start to see how it's making a prediction about the value dropping and it starts during the day the the consumption is higher and overnight it's lower, and so you can see that it's making this prediction of a value of one point, something when the value is still pretty high.

A

But it's because it's learning the data now it knows it's about to drop down to a lower value. So that's the CLI model. One thing I didn't mention when we were looking at the parameters before was there's a field for saying how many steps you want to predict and I had set that to one and five.

A

So what I was printing out here was the best prediction for a single step into the future and that's what we're seeing here and I'm not going to print out a whole hundred records, but I just wanted to show that you can get. You can't quite see this.

A

You can get the fifth step prediction as well, which is here, there's actually quite a bit more information that you that you get when you run a record through a model, I kind of wanted to show the the simple versions first and so I'm going to come back to that in a second and show the rest of the information that you get in this result from the model, but first I wanted to show another type of model which is an anomaly model, and so, in addition to getting predictions out, you can also get anomaly scores, and that tells you how unusual the data is.

A

Essentially what it does is looks at the predicted and active cells in the temporal pool and the proportion of active cells that were not predicted or the percentage fraction of active cells that are not predicted is the Nama score. So if most of your cells that are active, we're not predicted you're almost going to be above 0.5 and if most are not all right, most most of them were predicted and it'll be under 0.5. Does that make sense?

A

So here I just put the model prams again. These are identical. There's two differences, I think one is that I've changed the input inference type from temporal multi-step to temporal anomoly, and that tells it that we want to get the anomalous grows out and then down at the bottom. There's this anomaly frame section, although I don't know, if I don't think, that's really doing anything, I, don't think you need to include that so I'm going to run through the example.

A

I did before again same thing, create a model with the model parameters specify the field that we're trying to predict run some data through and printed it out. Here's the predictions just like before with an anomaly model, you still get predictions, it's still doing that, but you also get this anomaly. Squirrel show that first.

A

Oh I know I didn't.

A

Okay, so here, if I print out rather than printed out, multi-step best predictions from the result, inferences I print out the anomaly score and I get a value here, I recreated a new model to do the temporal anomaly rather than temporal multi-step, and because of that I didn't have any learning from before. So it's completely completely new model doesn't know anything about the data, so I fed five records through, and so that's why the anomaly score is a little over 0.5.

A

Most of the columns were not predicted because it hasn't learned the data yet so I mentioned I'd show you more of what's in the model result, so here I print out the whole result, and this isn't a very pretty format but I'll point out the important parts. So here the inferences section is what we are using. We are using multi-step best predictions for the predictions and the anomaly score. Is it here, there's also multis, but we're using the most at best prediction.

A

Sorry multi-step best predictions for each step that you're predicting it has a single value, which is the best prediction and it's the predicted value in multi-step predictions.

A

You actually get multiple predictions, so for one step, we have this dictionary that maps from the value that's being predicted to the likelihood so they're the same. There's a 23% chance of the next value, the value of one step in the future being five point one and get much higher chance. 76 percent chance that it will be five point three four. So there's a lot of information in this result set just in this inferences section. So that includes multiple predictions for each step into the future.

A

That you've specified that you want to predict the nomally score if it's an anomaly model and then, in addition to the inferences, this record includes just the raw input that you've fed into it, as well as some process things like data encoding, which is the raw input after it's been encoded with encoders that you specified and some other things like sequence reset. So if we were using that sequence filled with the gem name, this would be you could see when we cross the boundary, because it would show sequence. Reset is one here cool.

A

So again, the inference is. Second, here is really what you're? Generally, going to be using the rest of the stuff is to understand a little bit more detail. What's going on internally there any questions about that.

D

So the the model is it using only the temporal Pooler to determine all this, or does it do any extra work, that's not available in the temporal pool or to do the anomaly detection or the predictions? Yes.

A

So in both of the examples, I specified a temporal multi-step and then temporal anomaly so both get and then in the model parameters. You can also say enable/disable the temporal blur but yeah in both these examples, we're using encoders we're using a scalar encoder for the consumption value, which is the predicted field, we're using a the date encoder for the date time and I.

A

Think it was doing day of week is how it was encoding, the date so we're using encoders and then the operative that is feeding into a spatial pool and then speeding up for that in the temporal polar and there's also a steal, a classifier. On top the classifier takes the the output of the temporal Pooler and tries to convert the predicted cells back into a value, so we're predicting a scalar field.

A

So it'll know that some cells are predicted and its job is to figure out what is a scalar value to call that and that's where it figures out multiple possibilities and how likely thinks they are so for the anomaly detection portion itself. The anomaly score is computed solely based on the state inside the temporal polar but you're still using all those other pieces. With the exception of the classifier, you don't need the classifier.

A

The reason why we sell the classifier, though, is we do swimming a lot which I'll describe in a little bit and swarming is used to figure out what parameters are best for this data set and without a classifier there's no way to evaluate how good the model is. If you, you basically have to use the prediction to see how good the model is, then we use that to pick the best parameters, but once you've picked the best parameters and create the model. If you just want the knowledge square, you don't need the classifier.

A

Okay, okay, so that's it for this notebook I'm just going to show a few command line things, so the CLI model, like I just showed, can be created. Programmatically. You can feed your your data in and get the results out and do whatever you want them. We have a few tools to make experimenting with these a little bit easier. So one of them is this opf running swoops.

A

One of them is this: opf run experiment script and that, basically, you specify the parameters in a description, PI file, and then you call this script and give it a directory that has a description about PI in it. It uses the parameters from that file to create the model. That file also includes some other information about where to find the data and it runs.

A

The model then saves the results to a file, usually a file, it's pluggable in the description PI, you can specify the output, but we generally, when we're experimenting, just write it out to a file to look at afterwards.

A

A

Here, I'm going to run this with the same example hot gym. So it's the same example I was using before.

A

But, rather than creating the sealy model, myself.

A

Sorry I just trying to get this all in the screen.

A

Rather than creating the model myself I'm using the existing client and just specifying the parameters, and where do the data from the output to the terminal here is not really that important for the process of our talk, it's basically just periodically printed out the metrics as it goes. You can kind of follow the progress in this case.

A

It's not a very big data set amount, I think I'm I'm within like a thousand records through something, so it runs pretty fast and when that runs, I get outputs of this file in this Empress's directory in the directory that I ran.

A

It creates this inferences directory that has a CSV file and that CSV file is basically the input data with a few columns added on that have essentially what was in that inferences section of the model results when I, when I fan, ran a record into the sale a model and got the result back out and I was looking at the inference of section which has the multi-step predictions and the anomaly score.

A

Those get added on as additional columns- and this is pretty difficult to see here with its own arrow, but I'll just point out in this first example. You can see this is the record that was fed in and then initially before, I got the rig.

A

Diction's you gear this is this is the best prediction. So it's saying it predicts a value of 21 with a likelihood of 1, because it's only seen one record so far, but.

A

In that is you don't have to write the whole Python script yourself? This client is already written. You just have to create a file in this in the format that it understands, which is the description PI file, which I'll show you.

A

And this looks very similar to the model params that I showed you before. It's actually pretty much. The same exact thing so config here is what I was calling model frames when I was creating the CL Yamato myself and then there's a few things at the end of this file. In addition to the model params that control things like where to put the results in this case, I'm just specifying a file somewhere and there's also a the ability to specify what metrics you want it to compute.

A

I'm not going to go too much in detail. What I would say is look at this example and start with that. If, if you want to use this tool on your own data, basically start with this and then just adjust the encoders to match your data and then run it, and then you can tweak parameters. There's.

C

Also pages on our wiki about running swarms about the swarming algorithm and it links to a video on running swarms of presentation given by Ron, Mary and Eddie about this as well. That's.

A

A great point: what if you want to do, swarming? Okay, so that's actually the last thing I have to talk about it's going to go really quick, so I'm just about done.

A

You might wonder why you would use this description that PI, it seems kind of kludgy rather than just creating the CI model, and it's basically a more constrained version. You don't get to arbitrarily program. What to do. You have to just specify the parameters and the OPF run.

A

Experiment script does what it will, but the advantage of this is that, because we've structured the data, we can run a swarm on it and the swarm tool basically allows you to specify which of those parameters you want to permute over and what range do you want to try and then it will run a bunch of different models with different combinations of parameters and figure out which ones get the best results. So.

A

This is the swarm. Script is in just in the new pic repository in bin run, swarm and then I'm going to pass the same path that I passed for opf, run experiment but I'm going to point to the permutations pile permutations file, which kind of mirrors the description pie, but it just specifies which of those fields we want to put me over and I'll show that in a second. But first let me run this.

A

Again, there's going to be a lot of output here, it's just kind of so as it's running you can kind of like see where it is for most cases. You probably don't need to understand exactly what all this means. When this finishes it's going to write a directory, that's going to have the description, PI file from the best model, from all the models that I tried.

A

So it's taking a base description, dot, PI that I ran before it's changing some of the values running the model with the new values, checking the results and doing that a whole bunch of times and keeping track of which one worked. The best then, at the end, it takes the one that worked the best and writes those parameters into a file. I.

A

Should have started this a little bit sooner, but.

A

So when that's done, it's going to write this directory called model zero. Is that big? That's not big enough.

A

Laughing enough, can you guys see that okay, so inside this model, zero directory, it's going to have a description, PI and inside that file are just the differences between this file and the base description, dot PI that is started with, so you can see at the bottom. It's doing this import based description and then updating the config with those values. So in the permute in the permutation sub pi script, we specify should have said that first, let me show that real, quick.

A

So when you do swarming I mentioned you specify which fields you want to perm you over again, I would recommend taking an existing example and then just adapting it to work with your data, but basically there's these different classes that you create instances of for different types of values. So permute encoder is a permute that understands encoders. So it's going to permute, specifically over this floating point value for there I think that's the radius little. Yes, it's the radius of the encoder. So this is a random distributed.

A

Encoder, the radius is how big the buckets are, and the value of that can change a lot based on your data. So if your data has really large values, you're going to want generally going to want a bigger radius and if, as really small values, you want smaller radius but fine tuning that can have a pretty big impact because that that can take out some of the noise to take some of the burden off the algorithms.

A

But if it is too big, then you're going to have to the buckets are going to basically represent multiple values as the same thing that you might not want them to. So when the swarm runs, it looks for these permute objects and and picks a value in the range runs the algorithm and then for each of those permute objects-- in their permutations up PI file, when it picks that best model and writes out the description. Pi, it's going to write out the values that it had for each of those. So.

A

Just as an example here for time-of-day, these values were specified as something that we wanted to promote and, and these are the values that it came up with as the best. So that's basically it first warming I can't really give too much information. It's going to depend a lot on your data set, but I'd recommend finding an existing example. That's similar to your data set and then adapting it, and that's why you guys are all here.

A

So you can try this out and if you have any problems you can talk to us about it, but I hope that makes sense.

A

Did you have a question.

E

It's real quick, so.

A

When you create.

E

When you run the swarming, but do you actually have to specify like what's the metric that it's trying to optimize, is it actually doing optimization or is it just combinatorially expanding all your parameters, then picks whatever has the highest like? Do you have to configure.

A

All that yeah all that or is that built in the earth. That's a good question. So part of the OPF is this notion of metrics and in the description PI file you can specify what metrics you want. I showed that briefly at one point, but you didn't really go into details.

A

It's going to take whichever one you specify. First, whichever metric you specify first and use that as the the metric that determines which model is considered best so when it runs the models the output gets fed into the metric and then that metric will give you a value which is you know, but lower the value, the lower the error and the better the model. So it takes the one with the lowest error. There's one kind of tricky thing about those you can specify a window for them.

A

So if you have 10,000 records and specify the window as a thousand, it's going to just compute that metric over a moving window of a thousand records. So when you run this worm, it's going to take the last thousand records and the metric over those records is what's going to be used to determine if it's better than another model. So keep that in mind.

C

I

Kind of a newbie question, but I'm not sure that I have the proper understanding of the term model that you've been using. My understanding is instantiation of the CLA and in.

H

I

Configuration would that be a good way of describing the term model yeah.

A

I think that matches my understanding, but they are saying the OPF has this kind of pluggable architecture. So you could create a different type of model like a an Engram model, and that would what, when I said, I, guess you're, saying instantiation.

A

So when I say model I guess I use it to mean either an instantiation or just the algorithm like.

I

Maybe one implementation of that wave getting to the point would be what would be the difference between two models? Well, yeah.

A

I guess I have it.

I

Automated Oh, a temporal pool and one might not utilize a temporal pool or with those be two different models or yeah.

A

I guess I use it to is two different ways. One way I use. It is talking about the implementations, the algorithm, so the CLA is a different model from like an ending and Engram model. That's.

I

A

Abstract level, but then I would also use it to say an instantiation of the CLA without a templar versus instantiation, with our two different models: okay,.

I

A

The same type of model, I guess: okay,.

I

A

Used to yeah I'm your models.

I

Is like in a database you.

A

Know admit: 804 matters yeah, admittedly I'm overloading the term. So, okay.

I

Just want to make sure I was yeah correct on the term. I think I think my matches, yours Thanks.

F

I'm still a little confused about how the class of fire connects to computing the anomaly score. Is there some sort of metric that it has actually used for the classification function that.

A

Has contribute.

F

Towards that score,.

A

Know that when you're computing the anomaly score, it does not use the classifier at all and you can actually throw away the classifier and not use it at all and still get anomaly squares out. It's only needed to MIT to get predictions, and if you want to swarm on an anomaly model, if you have an anomaly problem, you may still want to do swarming and in order to do swarming you have to you, have to have a classifier, because you need the predictions to evaluate which model is better.

F

Okay, so other so the classifier is is used in swarming or as used in prediction, correct both both.

A

Yes, yes in swarming, it uses the predictions to determine which model is best.

C

Any other questions.

A

This is also this is it for my presentation, so if you're waiting for anything else, I'm sorry.

D

K

Just one more clarification.

D

I was having some trouble distinguishing between the encoder and the spatial polar because it seems like they were performing similar functions. But is it right to say that the spatial polar would take multiple variables which have been encoded and create the kind of group SDR from those multiple variables.

A

So the encoder takes a value in turns it into ones and zeros, and it generally encoders don't learn at all. So for a given input, the output for the output from the encoder will always be the same where the spatial Pooler adjusts the weights of the connections to the input bits for each column. So over time as those change, the output will be for a given input will be different.

A

Does that make sense, yeah.

D

That's one of things and.

A

The spatial pool also requires a an array of ones and zeros as input. So if you are starting out with the value 5, you have to first put it through the encoder to get ones and zeros. The output of that encoder will always be the same for the value 5 and then, when you put it into the spatial pool or the output of that, you don't really know what it's going to be. It's going to be the spatial poor's invariant representation of that value, which may change over time. Ok,.

D

C

Thank you, Scott Purdy,.