Numenta HTM School, 15 Apr 2016

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: SDR Capacity & Comparison (Episode 2)

Description

In this episode of HTM School, we formally introduce the Sparse Distributed Representation (SDR).

Properties of Sparse Distributed Representations and their Application to Hierarchical Temporal Memory: http://arxiv.org/abs/1503.07469
SDR Visualizations: https://github.com/nupic-community/sdr-viz

Intro music: "Books" by Minden: https://minden.bandcamp.com/track/books-2

A

The key to artificial intelligence has always been the representation. You want to learn more stay tuned for HTM school.

A

Hi I'm Matt Taylor from Numenta and welcome to HTM school I am really excited for this episode, because this is the first time we're going to delve in to the world of sparse, distributed representations or SD ours. So, what's the big deal about STRs, we sometimes refer to them as the data structure of the brand and for good reason, because it turns out that they're used pretty much everywhere. Let's take a look at an example suppose you were playing a musical instrument as you listen to the musical instrument.

A

Some of your neurons in your auditory cortex are becoming active when they hear specific frequencies, but most of them are silent. The same goes for your visual system. In every sensor region. There is a sparse pattern of activity that represents the perception of the world at any point in time. At the other end of the scale, your frontal cortex is involved in planning the music that you're about to play. It creates sdrs that represent that plan and beats those strs down to lower regions of the cortex.

A

Your motor cortex contains neurons that represent specific muscle movements. A small percentage of these neurons will fire in response to the top-down SDI representing the plan, as they recognize their part in the plan. They will fire in sequence and the resulting SDR will control your feeder movements over time.

A

Similarly, through STRs, you predict how your muscle movements will create sound, and you will use them as well to focus your attention on the specific activities that are creating those sounds essentially. Strs are used in the cortex for every aspect of cognitive function for every sensory modality, so we're going to jump in to some of the technical details of STRs now, but first I think I should go over a few terms. So, let's look at this graphic and let me define a few things for you.

A

First of all, I've been talking in previous episodes about bit arrays and representing them as grids or arrays of ones or zeros, from here on out we're going to look at STRs with a more visual format like this, where empty boxes represent zeros and colored boxes represent points. So let me define some of the terms we're going to be using for sdrs throughout this episode. First of all, n is the array length so in this case, 256 boxes in is 256.

A

W is the population of the array you remember from our last episode. Population is the number of on bits, so this is going to be 5 because we got 5 on bits.

A

The sparsity is the percentage of bits that are on very simple now I know. Sometimes this is called density, but we always call it sparsity and the HTM world, because all of the STRs that we deal with are sparse, so we're going to first talk about SDR capacity from last episode, I talked about capacity of a dense representation or a dense bit array. That formula was 2 to the power of the length of the array.

A

In this case it would be 2 to the 256 power, but when we have a sparse array and our number of on bits is restricted, u capacity, formula changes a bit. Sdrs are much less capable of holding information than dense arrays, but it turns out in the long run. It doesn't really matter very much and we'll explain that soon. Well, let's take a look at this capacity formula that we use for sdrs.

A

So in this array, right now of 256 bits and a w of five or five bits on, we can actually store 8.8 billion with the B values, which is pretty high. But let's take a look at this intuitively and dial this all the way down to a very small array.

A

So if we have an array of 16 bits like this one and we have a W of zero, nothing is on. It makes sense that the capacity is one. We can only store one value in this and empty set. Basically, if we dial this up to one that one could go in any one of sixteen places right. Therefore, the capacity is 16 and if we dial it up even further than that, the capacity changes begins. Changing very quickly for a W of two capacity is 124 W of three capacity jumps to 560.

A

That's because the formula for STR capacity involves factorials. So the formula is the number of bits in the SDR factorial, divided by the number of on bits, factorial x, number of off bits: factorial, that's what this formula represents. So it's not a very complicated formula, but it does give us a way to show how much how many values we can fit into an SDR. So let's take a look at 256 bit: SDR 1.

A

Once again at a sparsity of about 2%, we said 8.8 billion as we dial up the sparsity because of the exponential nature of the factorials in this equation. That capacity is going to go up very quickly, so we are at 11% sparsity at this point and 20% sparsity is a very, very big capacity, as you can see. So, let's put this down and let's slam this all the way up and look at an array of 2048 bits. So at a sparsity, let's get it to 20% all right. There 40 bits on sparsity of 2%.

A

Excuse me: there is an enormous capacity for this array already, and this is the entry point essentially for the type of dimension of sdrs that we use in HTM. This is about the smallest that we're going to use an HTM system, a 2048-bit array with 40 bits of population and generally we'll go all the way up to 64 or 65 thousand and keep it sparsity at 2%. So it just shows you that you can fit a massive amount of data inside of an SDR you can.

A

You can represent a massive amount of different values for STRs, especially as we increase the size of N and W. So let's talk about some comparison now and I'm, going to take one STR that I've randomly generated here on the left of 1,024 bits. I've got a sparsity of 10% here, I've dialed up a little bit just for the sake of example. So we have about. We have exactly one hundred and three bits active in this left. Sdr. The right SDR is another randomly generated SDR.

A

It has the exact same NW and sparsity as the first one, but it is just randomly generated, so we have two different random STRs. What we see on the right here is the overlap. So if you remember from our last episode, the overlap is just a simple binary and operation, nothing more so in the overlap. There is a bit on if, in that exact same position in the two SDR as being compared if it was on both of them. So we can see. We have a much much sparser array here.

A

This overlap is just about 1% of the bits being on and it represents where they are exactly the same. So this course kind of gives you a similarity of the two STRs you're comparing and we actually call this an overlap, score and we'll talk about that as we move along in the next episodes.

A

Instead of doing a and operation, we could do an or operation when we do an or operation. We call it a union of these two STRs, it's basic, putting them on top of each other right. So it's going to be much more dense than the sdrs that are being added together in this core operation.

A

So if we've got a sparsity of 10% from each of the ones that we're adding together, we get a sparsity of about 20%, the, not quite 20%, because there are some that are shared that are not getting counted twice and then we can also see in this comparison view exactly which bits belong to the left, SDR versus which bits belong to the right SDR and we're displaying here right on top the overlap. Score is 15 between these two SDR.

A

So that's how many bits they have in common and we'll talk much more about the overlap, score and overlap, sets, and especially about unions, which is a very important property. There's some surprising properties of unions that we really take advantage of in HTM systems that we're going to talk about in the next upcoming episodes, so stay tuned for those. But right now we're going to go back and show some matching operations. Okay, so I'm gonna, try and show you that STRs are noise, tolerant, they're, actually, quite noise, tolerant.

A

So, in this example, I have SDR of 2048 bits with a population of 41, which is about two percent sparsity and in the center SDR I'm, adding 33% noise, so I'm just flipping some of those bits, keeping it the same sparsity but essentially giving it about a third noise and, on the right hand, side. We see that comparison view that we just saw in the previous visualization all of the red bits. That's the overlap.

A

The overlap score you can see here is 28 between these two, so that 33% noise that we added resulted in an overlap score of about 28, and you can also see the left bits and the right bits over here. There's this big indication at the bottom that says nope that is telling you whether this is a match or not. So we don't have to have an exact match. We can do sort of this fuzzy matching. We can.

A

We can just compare sdrs based on their overlap, score to decide whether there are a match or not, and the way we do that is by using this value. Theta. If you read the SDR literature which I'll link in the description of this video you'll know that theta is a threshold and if the overlap score goes under theta, we would say: that's not a match. If it's equal to or greater than theta, we would call it a match. So in this case nope it is not a match.

A

However, let's say we change theta, let's see the overlap, scores 28. We can dial it down to 28 and then it's a match right. So, but the interesting thing is here: you can set your theta, which is your threshold for at what point you're going to match an SDR or not given a noise, and then let's change our noise. So we had it at 30%.

A

Now, with a w of 41 and a theta of 30. How much noise can we tolerate? So, let's dial this up to so we're not matching up above 70 80 percent noise? How far down can we go? How much noise can we tolerate keep going, keep going there? We go around, let's see 29% noise, so we can add almost 30% noise to this SDR and still recognize that that SDR matches the original SDR without noise and here's the interesting thing.

A

Here's we have a formula to tell what is the chance of this being a false, positive and I will get into this in the next episode. We'll actually look at this formula and do some calculations on overlap sets, but it's there is a very, very small percentage, almost minuscule astronomically small percentage, that this is going to be a false match. So we can tolerate, with the theta of 30, up to 30% 29% noise in the signal and still almost never almost never get a false positive and that's really powerful with sdrs.

A

So let's say how, if we go up to 50% say we have a system that has 50% noise, how low do we have to go with this theta to get there? We go 21. So if our theta is 21, we can get. We can be fairly confident that we can. We can match it and the chance of false positive is still 9.8 to the negative to 10 to the negative 30 power. So that's still really small. So STRs have a massive resistance to noise. It's one of the key properties.

A

It's really important, and it's one of the things that makes your brain so intelligent as well. So thank you for watching this episode of HTM school. If you liked it, please give it a thumbs up and subscribe to our YouTube channel. So you will not miss the next episode, which is going to go even deeper into STRs I'm going to talk about overlap, sets we're going to talk about subsampling at some point, we're going to get to unions and that's where we really get into some of those HTM properties of STRs that are so important.

A

So stay tuned thanks for watching.

A

Wanna learn more stay tuned, Fred, suppose you're playing musical.

A

Your motor cortex motor cortex, now your motor cortex, are your mortar. Now your motor cortex.