Numenta Numenta Talks, 8 Feb 2017

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: HTM Spatial Pooler

Description

Numenta engineer Yuwei Cui walks through how the HTM Spatial Pooler works, explaining why desired properties exist and how they work. Includes lots of graphs of SP online learning performance, discussion of topology and boosting.

See corresponding paper at https://discourse.numenta.org/t/the-htm-spatial-pooler-a-neocortical-algorithm-for-online-sparse-distributed-coding/1548

This was recorded in the Numenta office during an engineering lunch meeting on Feb 8, 2017.

A

So I will talk about HTML cooler today, and the background of this top is the spatial cooler has been proposed six years ago in the new mental white paper and.

B

Recently, we related to.

A

Facial cooler and because, as you know, this is a recent paper and the review in compliance merely discussed the properties of facial cooler and come up with a bunch of metrics that can quantify the performance of facial fillers. In addition to these, two papers are also a bunch of other resources, a good line, the HTM School, where math talks about the properties of special tutor extensively, so that small approachable results and what I'm going.

B

To do today is amount.

A

Of academia like talk that covers the paper basically so cause also special puller is open sourcing in your pick, so at a very high level, Sabri is a powerful streaming analytics. Engine is continuously receives a vast amount of information from periphery sensors, and this is in the form of millions of light chains. So a fundamental question in Europe science is: how does the neuron, in the context, learn to respond to specific input passing? How do they decide, which include? Should they respond to and on the population level?

A

How to a population neurons represent the input, features in a dynamically expectable and go back to a HTM current generation of HTML those models, this continuous sequence running process in the brain, so it takes the.

B

Data stream converts.

A

Into binary input patterns through some encoder and the spatial cooler further picked a very simple and then produced sparse, distributed recitation where the HTM sequence memory learned transitions.

B

A

Ser the order to support different learning tasks such as prediction, online, anomaly, detection and classification.

A

So here today, I'm going to focus on spatial fuller, so it takes one arm all you could field and those can be unlabeled inputs of this variable sparsity and the opposite of the special cooler.

B

A

360- and it also satisfies a bunch of properties- that's important for downstream processing, actually, because the spatial cooler is at a very upstream in the whole HTM system is critical to get a facial, fuller right in order to have any learning down for sequence, memory and the question we often get up by the communities what properties to SP the special cooler achieved. Why we need learning SP? Why can't we just use random act II and was a function of boosting SP, so I'm going to talk about properties, various properties or SP, this presentation?

A

So that's the background, so I've start by describing the algorithm of spatial for us. This is not a very detailed description to like more like a high-level summary of the activation and learning algorithms in the spatial polar and then I will focus on the various properties and discussed the founder of metrics. That can only buy these properties. They have done a lot of.

B

A

To verify that this matrix is meaningful results.

A

So the HTM models, a layer of cells, is the context and it contains a set of cells are organized into what called mini columns, which is the word instead of vertically aligned itself and the HTM theory. We model each neuron as having three different types of input, including the piece for the input that targets on the proximal dendrite next segment, and the contact should includes happy down to the lab through lateral includes that technology so basal to join excitement and the feedback.

A

So the spatial cooler models before word processing, actually columns levels, I'm going to use the term mini column, mostly in the spatial Porter, and we use thousand for different functions in terms of and thousand affinity columns have different librarian codes of the back impulsiveness.

A

The special puller receives input from a set of input, neurons and each mini column typically connect to a subset of the input space. So by default it has if the input space has topologies the special food, also models, pathologies, and by that I mean nearby HP. Mini columns will receive input from nearby a region nearby subspace of the input space. So.

B

A

A gray square showing the potential connection for an individual HP mini column, so by potential I mean it is, may or may not connect to this simple neural in this region.

C

Depending on the.

A

Learning and the sophistication cook, so this is the range is, can connect to basically.

A

At any time, only a small fraction of the mini columns are active in the spatial cooler and those mini columns are collecting file in your take our process that involves local education in the facial for work. So each.

A

B

You can't see it only in the process. You can hit the lower Jerome.

D

B

A

B

Not, as is not me, is recording something.

A

Okay, so each acting any columns computed a overlaps calls the input we have come see. My.

B

A

B

Symbols here this is a.

A

Nymphadora- and this is a synaptic connection from one input to the current mini column and it performs.

B

A

B

Basically, and.

A

That's the overlap, car and within.

B

A

Row coefficient regions, the top pay percent of the main columns that has most of the overlaps column will become active and your videos percent any column to become active interspatial product. So it's a very sparse activation, but at the same time is also each abilities. You have more than one columns at you at any time, so here's the synaptic connections are modeled by a binary connection lee. So this is almost the neural network models that feels accurate, synaptic weights.

A

So here the scenario is either a zero or one, so we have a permanent somatic permanent associated with interesting apps. So if that a power certain threshold that synapse will be labeled as connected, and otherwise this will be simply zero.

A

So this is a simple binary operation, so the input is a binary input vector and the connection matrix is also a binary matrix.

A

So you may notice, there's another here. Obviously, composting in facial Porter that's useful to ensure homeostatic accessibility control. So basically the input is weighted by this booster factor, which is kind of like a factor that determines how expectable the current in economies- and that depends on the past, activation history of that. The recent activation history of that mini column, so, depending on the system tracking the boots, a factor as a function of the recent activities, frequency of that mini column.

A

So if it's, the activation frequency is exactly the same as a target level, which is about two percent. There will be no boot. Boot tracker will be one which is no boost. This is large, actives and desires that mini column will be more accessible than its neighbors and active. Too often, it will be like acceptable to its neighbor. So using this mechanism we can encourage all many columns to participating in representing the input. This is called homeostasis in your mind, and I will show you why this is important later with the metrics.

A

Traitor pool use very simple, happy learning rules, basically, at any time acting mini colonies will reinforce your activity, include connections and, at the same time it will be provides all the inactive input connections.

A

So question they often got asked is why we need happy learning and what does learning achieve is a special for I? Can we just use your interesting than the special folder?

A

So, if you think about this, the actually include you observe in the data stream is a very tiny fraction of the whole input state, and you can give this input a truly random. Any action is randomized. People do just as good as any training special Porter, but in.

D

Reality there's.

A

Include a structure so basically the input IPR occurs with not with the thing computer to sound, like they are the curse, Marshall Marshall as the others. So.

C

A

Structure in it and the.

C

Low of learning.

A

To capture this underlying structure, with idea of the actual freaking inter-korean approach to the federal representation to random noise after learning, so we quantify what we mean by better with a bunch of properties and a little goodies, so this include person there should be on page apart knees. This is like an inherent property of the special Cooper. In addition, those other properties required learning such as distributed Colleen, to use our columns to represent something to preserve the systemic sing aloud.

A

If that includes achieve knowledge, robustness and also if the system does include changes, you want to be able to continuously adapt to that. Those are required learning and we occupy each one of the properties with the matrix that can be computed based on the input and the output equation.

A

So the first property that is it has you need to form of nations that have relatively fixed smartness. So in this experiment, I'm showing you on the top graph is a bunch of inspectors that fell into the spatial cooler and it has variable of 5. Star parties sounds happens, a sensor some patterns of butter and on the bottom I'm showing the output of the spatial quarter. It has the purposes of neotec minimum variability.

A

So here we define Spartan is, as the population Spartan is basically, as at each time point you calculate how many HP mini columns are active and divided by that by the total number of mini columns. But typically we have around 2% of the mini columns as giving the spatial cooler, and this is out of 1024 2048 columns in the entire system. Here, I'm only showing you a fraction of small fractions between columns because of the spacing and the diversity still.

B

Wiggling a little bit- yes, apologies and.

A

Skills would come holiday if you, if we use global inhibition without any call ecology, we have exactly the same performance over time, at least. Hopefully there will be some slight vibration around me.

A

So that's part nice and the second property is distributed.

A

Basically, we want every single many columns to participate in representing the input, so this is important from an information theoretic perspective, so you can imagine if I see four columns, that's not active at all. It hasn't coming at information and four columns of active a lot for a lot of time. They also doesn't convey information, so we want to ensure every column has some action in it, so this guy needs to come true only bodies which first calculate the activation frequency of each mini column.

A

This is a crowd, some internal of the inputs, so assuming how and reinforce this is how often that mini column becomes active and we have shown the problem is the introduction of the activation frequency across all the columns and, in this case, I use a data set of 100 random ideas as input before learning about 30% of the I'm not used.

B

A

How they are not asking for any of these inputs and not the burning. We have every column acting for about 2% of these new cooks and we treat each mini column as a binary variable and compute the entropy, the binary entropy. For that, and then the entropy of the entire spacial cooler is complete. The summation of entropies of individual mini columns, so here I'm showing the entropy before during and after learning. So exactly there's. The difference is not huge. This is suggesting that given adrenaline seen as a reasonable w this, but still.

C

There is improvement.

A

Of entropy, with learning.

B

There should be lots of log scale. It's a little part too yeah I thought along yeah yeah, bigger than it actually looks. Yeah.

A

We kinda a different.

B

A

With the convenience, turning in a later slide, iron show of activity for a.

A

Second, property of XP is knowledge blackness. This is after learning, you want the output of the spatial cooler to be relatively insensitive to not in the input, especially those inputs- that's frequent occurrence, so here I'm marrying the amount of change of your special power output as a function of the noise level. Noise here means in activating a small fraction a fraction of the activities and, at the same time, you also active the same amount of the activists to keep the responses of thing in this experiment.

A

So the blue curve here is showing the performance of the random spatial protocol running. But you can see there is a fairly big changing the spatial cooler output, even though it's a small mass small amount of knowledge, but after learning about during.

B

A

B

A

And this is after folding, the petition to seeing our data set. There is almost no change of the spatial Pro output image will change 40% of the input rate, so we have occupied this knowledge robustness as the area.

B

A

This curve, this is basically function for occupying, and this is the resist knowledge about these matrixes performance before before training and after chain. So there is a pretty significant improvement of knowledge about skills. In this case, we can monitor all these properties. All these metrics as a function of learning continuously so here on. The x-axis is a box of time of training.

A

How many iterations go through the entire dataset and I'm, showing you, the engineering you created by a logical, Baptist, also another important matrix is the theology, so we don't want acti to be changing all the time. So if the same encode is present is twice, we want to have similar facial Pula output and that ability matches is basically qualifying.

A

What's the percentage of difference on the spatial color output, given the same input across the across room at all and on the bottom, the true graph is showing the number of musing axes and the number of synapses is eliminated from the spatial footer at the bottom train. So it pretty much stabilizes after 50 at all.

A

In this case, and then what we do is we change the data set to a completely new different data set and we want to see how how quickly can the spatial cooler adapt to the change of the data set, so I think.

C

I see initially all.

A

These metrics job on to entropy and knowledge, robustness and an interesting job interview here, is much worse than random ice cream and.

A

We can understand why that's the case by looking at the activation frequency distribution on the new dataset, so right after into the new data set 70% of the special cooler mini columns are not responding to anything good. So this is a pretty significant and we see this big data discovering the entropy number entropy matrix, but after.

B

A

A while it recovers back to the to the baseline, this is showing the faces to the chart on the left is showing the entropy before the change read out the change and after recovery and the black dashed line, showing you the theoretical, optimal entropy level. Given the capacity constraint.

A

We have the same behavior.

D

Reason or gelatinous.

A

Matrix, so this is right after we change to a new data set, they go from the blue curve for the green curve, so it's become sensitive to noise once again, but after learning is recovered back to the to the red curve, is ship almost exactly the same as people before the change, so this is showing all the the power of continuous learning rather, which allows you to easily adapt to statistics of the input data.

A

So if the input changes, the spatial fuller will adjust the connections to by the representative input and another interesting property of spatial. Cooler is fault tolerance. So in this experiment, while we damage a fraction of the physical remaining columns in the center and the monitor a bunch of metrics, both before the damage and after damage to see how to recover the one thing is the really special for many columns. We computed a receptacle Center, which is basically we look at all the critiques for a given mini : and compute.

A

The two Center for that and another metric is we look at a lower rate of include space for each input neurons, how many spaces so many calls, as team in columns connect to utterly monitor the growth and elimination between apps during the recovery, as well as the booster factor for each special cooler mini column.

A

So this is the result of this experiment, so we remember we healed a fraction of the facial food or mini colony in the center, so right after the damage, there is no receptive field center. There is no, basically, no no doubt are connected to the center of the input space. So this condition.

D

This is the input.

A

Space coverage, if you have big hole in the center, that's how many mini columns are connected to individual input graph. Also.

B

A

Showing the showing of the receptive field centers, there are no any column that has a sense of this captured in the center, and this is after a couple hundred a box, it's gradually recovered from the Spanish.

A

This is a. There are two things happening. Why is there's a lot of synapse growing in the English center that connects from the racial poodle mini column near the boundary of the damaged region to the to the damaged region?

A

B

A

This is not sure you where clearly this graphic or what's happening here is so I have for the damaged column near the boundary tends to actually losses in its neighbor, so it has a lower boosting factor. So this because of this a difference of booster factor, you will see a shift of the receptive field. Stencil cabal Colossus of damaged region. We have physics movie, so here is the damage, and you can see the receptive field starts moving slowly towards the demonstration to cover the damaged part of the spatial cooler.

A

In a second experiment, we damaged part of the enclosed space. Is that simply, similarly with blocks part of the input neurons and just don't allow any activations in part of the input space so biologically? This is similar to a lesion in the retina, for example, a focal lesion in the retina and the prevents damage experiments like a lesion in the visual cortex, and we do monitor the same metrics and in this case, because the lesion is in the input space right after the little engine, nothing happens.

A

It takes a while for the effect to happen and what's going on here,.

B

Is initially of.

A

Course, their cells connecting to the to the center part of the input space, but after a while special cooler realizes, there's actually nothing to be represented in that part and the connection to that region. Regions you get eliminated due to the happy learning room and after the ecologist epithelial my gosh after the astronaut reorganization, no doubt almost no tells how receptive field in the center, because the others there is nothing to be representing the past. They have to remove a way to maintain your activation level.

A

So if you look at the booster factor right after the lesion mean columns that represent the lesion part will help you don't have any input. The activation frequency will go, go down. You have lower activation frequencies in neighbors, and the boost factor will increase the soil physics, the right block in the center, and they start to completely the neighbor mini columns and such represented inputs that close to the region, property inspector.

B

Then go back down again and this is stabilized, or does it stay like that? I goes back down because I really like just right after the change in your houses, all this yeah, it will be here. Yeah, then it'll say water. It.

A

Requires this increase of accessibility for those mini columns to to learn to represent other stuff? If you don't have this Magnum, this digging ecology to stay there and it will not be used for anything and this actually, unlike the previous experiment, this happens fairly fast. So it's almost instantaneously so nice because it takes a couple. Maybe five apples from go back here almost instantaneously. It's.

B

Actually, a coincide.

A

With some experimental study where they found out, if you do a focal lesion in the retina, the cells in the visual cortex, we augment your receptive field by artifacts within several minutes. But if you have a stroke that damaged part of the visual cortex, that reorganization occurs at a much slower time scale and takes month is for to recover. From a stroke, the.

D

Third phone number I heard within four months: six, four months to recover again and after that.

A

You're not going to sort of yeah yes, but it's the link down this input sensor rate is reorganizing. Violet has increased, recognize, there's nothing here to be representing our system. You might enough for now.

A

So those are the properties. I talked about faceless parsonage. A future field is coding, for together gives you a percentage of the decoding and triggering semantic.

C

Similarity I didn't.

A

Really talk about this basically map simulate includes two similar output. It's kind of showing you this nozzle busting in courses. Your change, that includes by a small amount, doesn't affect the output much after learning, also continuous learning and simulation so that how so the resources are online and balanced by the paper and also the HTM school is a good shop to learn about special color I.

C

Have a biology question.

C

Related to the recovery of stroke when it reorganizes following the stroke, how much of the recovery is result of dendritic like growing to do big segments release.

A

All of these people fires growing well.

C

uh Yeah I guess we're going to have to have to rewire it so.

D

I'm kind of get some ginger. If there's two things here, one is what's going on the biology and how do we notice? What do we model and saw in the biology is rolling around SE? The tips of the dendrites are constantly growing out and trying to find connections, and- and so we can't just decide- I need to go to X. Some distance away have to so feel its way there, and it will continue growing unless it forms new synapses that are useful right.

D

So it's a much more organic processor in so after we don't model that we don't have to. We can just define this region of potential synapses and willy-nilly just say: we're going to connect any one of them, because we also worry about physically keeping the health of a dendrite of keeping the ACP going on electronics, but so we model in a simple form and I'm modeling the growth. So we achieve this exact same result by just having a larger potential pull up. Sternum right.

D

C

Tips are growing, is it branching yeah.

D

I think so it does yeah I. Think there's some movies of this to do. You can see that online, but it's a much more dynamic process that people realize this hats on having super fast and when you could slow it down and speed it up, and she does it sort of like this little you know seems moving in and out trying to find new connections and it will split and it'll just keep going.

D

Question when we talk about this spatial order, we usually many columns the things that have the result rather.

C

Than individual, don't is the business on the neuroscience science business truly time committee problem shortages as well, this mechanism mess with individual cells. Can you help nurse visual.

A

B

A

B

A

Neurons, we need to see me column have to simply cover the static field. That's a fact. The question is how.

B

Do you catch the neuron interesting.

A

We convert the same receptive field. There has to be some mechanism to ensure that these are the same thing, so you cannot just rely on this. The topology of the input space- that's very accurate I, had think that when you look at the branch and further singles I'm, a cotton fiber is much more broader than the scale of the receptive field. So that's part of the paper I haven't talked about. This arises very enemies paper. They speculate about two potential mechanisms. Mechanisms for this to happen. Why is you?

A

Maybe you rely on me heavy trainer on ensuring that all the cells in the mini columns I have the same research field? The other is maybe there is a especially during development. They have a sub planer on the first versions of the piece powers of the field, and then she causes stress observances itself and it's a mini column, to learn the same thing so could be both so.

D

We draw our eyes, are beautiful, yeah I'm, not sure? That's exactly the question you're asking not exactly like that. I'm doing, question I think you're, attractive and there's a reason why they should prove could be applied to this neuron to be columns with no elevator there. Yes, first among Tourette, occur. Point of view does not become haters. Would look that a wheeze? That's not useful, because the way the whole system works and and be, and.

B

I don't know any evidence in.

D

The context of slices of this you a was just saying all the neurons. We know that reducing stores in put on the other proximal synapses are a mini, cost and share economies of the fields in committee. Come so, but technically you could just think of the many Commons our honor, yes, but not any other Spadina fitted white dress.