Numenta Sparse Distributed Representations, 20 May 2016

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: SDR Sets & Unions (Episode 4)

Description

Using SDR sets and unions to identify SDRs that have been seen in the past.

Help me decide what episode to do next, Encoders or Spatial Pooling! Comment below or vote here: https://discourse.numenta.org/t/htm-school-episode-4-sdr-sets-unions/455

Intro music: "Books" by Minden: https://minden.bandcamp.com/track/books-2

A

If you find yourself in a class all by yourself, you might be in HTML school.

A

Hello and welcome to hm school I am Matt Taylor from the Mensa. In our last episode we talked about why STRs are fault, tolerance and noise tolerance in today's episode, we're going to talk about sets of STRs and unions of STRs. If you remember from episode, 2 STRs have semantic meaning and HTM systems. This means that SDR is with similar. Overlaps have similar semantic meanings. Imagine a stream of STRs and also imagine that we're collecting these STRs and putting them into different sets as they're coming across the stream.

A

So we've got these different buckets that we're putting these STRs over time. Those will collect into a significant amount of STRs in each set. Now, as each new SDR comes down the stream, we can compare it to all these different sets to see if we've seen it before and potentially understand what it might represent. So, let's see a visualization of STRs and sets in action.

A

So here we have an SDR of 256 bits with four bits on that is a 2% sparsity and that's what this is drawn right here and what's going to happen, is I'm gonna click. This button add SDR to stack and it will drop that SDR down below the button bar here and every time. I click it. It will generate a new SDR and drop the last one down into the stack. So this is going to give us a big stack of randomly generated STRs.

A

It's keeping track of how many stacks are in the set and I also can click this button to add I think 50 all at one time. So then we'll end up with 63 STRs in this stack. So now we can take one of those STRs I'm, gonna click, this match button and we'll click one of those STRs randomly and sort of simulate that we're seeing this SDR again.

A

So again, imagine this scenario where we've had a stream of STRs and we're taking some of them out and putting them in this set, because we want to compare the STRs that are coming through this stream to that set to see if we've seen it before so when I click this DR, so some random SDR and the stack it is going to bring it up top here. This is the SDR that I clicked it's going to highlight it down at the bottom. That's what this orangish highlight around.

A

It is, and it's going to rearrange this stack and order it by the overlap scores of every single SDR versus the one that I selected. So as we can see here, it identified the one I selected as having the highest overlap score and it ranked up to the top winning this competition. I have a noise slider up here, so I can add bits of noise to it.

A

If I slide it up and add one bit of noise, as you can see, the overlap score of that SDR that we matched against went down because now I have one bit of noise. Innes and I was only an overlap of three. So another interesting thing that I want to show off here. Let me refresh this I'm going to click this calculate false-positive button and what this will do is, as I add more STRs to the stack it'll.

A

Do a calculation to tell us what's the probability that some random SDR will match against an SDR on the stack, but it's not really a match so initially I'm just going to add a couple of STRs to the stack here. Here's our probability of false positive. Currently it is 2.3 times 10 to the negative 8, which is a significantly small number, but not too small watch what happens to this number as I continue to add STRs to the stack it goes higher and higher and higher and, let's add a bunch I'm going to have 50.

A

That number goes. Alot add another 50. It goes up even more so the more STRs that we have in the stack, the higher the probability of getting a false positive, because there's more chance of random collisions between data that we have and the data that's incoming.

A

So all of this gets much more interesting when we're dealing with bigger STRs now I have increased the size of the STRs, we're looking at from 256 to 2048 bits. Just so, you know I'm not displaying all of them, so these STRs go onward and I'm doing the math on the feet. Strs, but I'm only visualizing a percentage of them so that they can all fit on our screen. So this dimension, let's add a bunch of STRs to the stack right now. I have 53, let's get it up to over 100.

A

So now we'll have a 103 in the stack and click match. So if I click a random STR out of here and again we're trying to sort of simulate that we're seeing something we want to figure out if we've seen it before or not at this dimensionality.

A

Look how steep this this curve is here for the one that we actually identified as the matching STR versus others that already been coming close and you might notice that I'm adding 10 bits of noise to this. So 25% of these bits are actually noise. If I take the noise all the way down, then we'll get an overlap, score of 40, or at least we should what's, the slider ticks all the way down yeah. So we get an overlap. Score of exactly 40, but you can see how resilient this set comparison method is to noise.

A

Even if we add 50%, noise, I could still adjust my theta to 20 20 something- and we still have a significantly steep curve here and a nice chant a very high chance that we're going to be identifying the proper STR. Yes, we have seen this SDR before so, and we can also calculate the the false positive rate for this, as well as we add more STRs to it.

A

So in a stack of a hundred and four different STRs of this of this dimensionality, the false positive chance, that's just some random STR will match something in there that we actually haven't seen is 2.5 times ten to the negative 24, so significantly low number, not astronomically low but pretty low. The problem with this, this type of classification, is that it's it's a lot of calculations.

A

I've got to go and take that incoming, see our and compare it to every single one of the sdrs in the set to get an overlap score, and that is very expensive, as you might have noticed from the responsiveness of this visualization, but that's where we go and start talking about unions.

A

So, as you hopefully remember from previous episodes, a union is where you take one or more STRs or bid arrays, and you order them together. So we're going to turn bits on if any of the bits in that space are on and the ones we're comparing so now. We're looking at unions in this visualization I have another SDR on top here.

A

We're gonna start off dealing with another 256 bit array with four bits on: that's a 2% sparsity and it's the same thing when I click this button, it's gonna dump down here onto the stack I'm, keeping a stack down here at the bottom, but I'm also keeping track of the union of the stack right. Every time. I add an STR to the stack I'm, also going to pour it into that Union. So the more STRs we add like here's I'm, adding 20 at a time now, I'm gonna make this set really big.

A

I've got 75 different SD R's in this stack right now. This Union gets denser and denser and denser. So now, if we take a random STR when I click, this match random button, it will tell us that, given this, this Union, our chance of a false positive is about 23%, which is pretty bad. So every time I click this match random button, there's about a 23 percent chance that some random STR will overlap with that Union by the four bits that are required for an exact match, which makes sense because this union is super populated.

A

So you know I would would expect that 25% of those of these random STRs would match so the more you, click and I you can see over here, I'm, plotting or I'm, showing the overlap score about a quarter of them will will actually match. You can see that match indicator here. So the more I add to the stack now I have 95. Now I've got 115 that probability of false positive goes up and up, and so now I'm matching in about 50% of them will just be random matches. You know we don't want that right.

A

We don't want random matches. So let's make our STRs bigger, clicking the go big button once again and we're going back to a 2048-bit SDR with a w of 40, so 2% sparsity again so keep in mind again that I'm only displaying the first 256 bits in the visualization but I'm doing the math on the entire STRs. So let's add some STRs to the stack, and one thing you notice immediately is the chance of a false positive is much much lower because we're dealing with much bigger STRs so as I add to the stack.

A

Yes, it continues to go higher and higher the higher chance of a false positive. Let's add a 20 at a time. So now, at this point, I have 49. Let's make this an even 50:50 STRs in my Union. You can see them represented all down here and the stack of 50 STRs in the set and the probability of any new random SDR been compared to that Union for an overlap of an exact match of 40 bits.

A

The chance of that happening is seven point, eight two times 10 to the negative ninth power, so pretty low, not hugely low but pretty low, and so we can see that happening when we start matching random. We're never gonna get around a match. I could sit here and click for probably weeks and never get a random match with a probability of false positive that low. We continue to add more and more STRs.

A

That probability of false positive will continue to go up until we've saturated that Union to the point where we're starting to get a lot of false positives. So the more I add to the set the denser and denser this Union gets. You can see right here. The Union is at this point at 93% density. It's really surprising, even with a union that 93% fits on the chance of a false positive, is still only 4%, so that's still pretty low.

A

When you're saying just taking some random SDR and seeing if you've seen it before it's 4% chance, you're not going to be right, it's going to be a false positive. You identify something you thought you've seen, but you haven't so, and the calculation for doing that comparison is just so much faster than it is, if you're keeping the entire set at every SDR in the set so wow. This has been the third episode that we've done just on STRs.

A

It's time to move on to something else, but I need your help to decide what we could go in one direction.

A

We could talk about encoders and encoders actually take real-world data and convert them in some way into representations that have semantic meanings, we'll talk about binary representations with semantic meanings and how we can encode meaning into those bits and that's more, the sensory aspect of things as far as HTM systems or we could go and talk about spatial pooling, which is a mechanism for normalizing, St artists and the spatial aspect and that's more of a cortical process, so that would dive right into the cortical Theory. It's up to you.

A

I want to know what you think should we go with encoders we go with spatial, pooling we're going to get to both up eventually. What do you think? Let me know in the comments. Don't forget, like this video, if you're enjoying the series and subscribe to our YouTube channel and I'll keep making them for you, thanks again for watching HTM.

A

Str is not feeding the sternum starting over and when I click these buttons. Some things happen.

A

So, let's look at so I am just going to start over goes and every time I have a new staff or a new SDR to the stack.