Numenta HTM Chats with Jeff Hawkins, 22 Mar 2017

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: How Sensorimotor Inference Works (Part 3)

Description

Now some details about how sensorimotor inference works in HTM theory.

You should have some background in HTM before understanding these videos. See http://numenta.org/htm-school/ for more videos explaining basic HTM theory.

Music: "Holy Roller" by YACHT
(used with permission from Free Music Archive)

A

So again, let's just go back to this very first line and the deck first money here you and I. This is just showing one version disciple, a a four layers. Usually it's we're just gonna talk about this two layer, input, output, layer model right now, I'm going to go down to a more detailed version of that right. Okay, so this is the slide vehicle till tomorrow. So we're just going to look at these one of these two layers, the two of one, these two input output level.

A

Three, so we did two layers of cells now, but we're going to introduce another thing: we're going to talk about multiple columns, so it went on the concert number, the bigger columns on the mini columns. These are a horrible call core four columns or is about half a millimeter there, they're just bigger okay,.

B

But if the collection of.

A

It's a collection of many columns and it's just bigger an extent. Okay, okay, so in this picture here we show three comp come one come two come through the input layer in each of these columns is equivalent to the spatial fuller right. We have today it's a mini columns, sax and mechanism, but the difference now is we have this arm typical location on object right, so we have these three columns and you can think of them like the tips of your three fingers.

A

So as a comrade setting, you know these three fingers and each finger touches something and each being get some info. Now it's it's really good thing about in terms of touch because you can net. You understand that three fingers moving somewhat independently they're, not touching the same far server and.

B

As different movements.

A

Is important so, but now what we're saying is we have a two letter model for one finger to their model? Eventually, my elevator a different.

B

A

B

A

Each one is one common and the important thing is they communicate in the output layer, the operators, these long-distance connections, so there's two or three of these long-distance connections across both columns right, maybe like 16 columns in the cortex pecker and lay 502. So this is a very noted feature of these long-distance connection.

A

What way do you think about it? Each finger is getting some information about us, so my index fingers touch this coffee, capsule I'm doing this feature at this location that, on its own may not be sufficient, identify the coffee cup probably isn't I can I could evident coffee cup I move my finger in multiple locations, vaginally I'm, not seeing right I'm just reaching in this black box and touching things yeah.

B

But you can have one sensor: I.

A

Can do this like this? It's kind of like looking at you over to the straw. I sure is the world it because I got to move a lot by little eyes, but I. Do it I know that's man, this your hyung Connect laptop John, so that's the same as touching with one finger, but often you get to touch with multiple sensors. At the same time so my hand, my three fingers touching it I would get three different feature: location representations on the object and essentially to get it down.

B

Yeah and so imagine.

A

What layer the input layer is doing is representing not just features but features and location pairs. It's forming a sparse representation of a feature at a particular location. Just like we do sparse representations and simple memory, us or a feature a particular location, see so at the feature, a particular location. It doesn't identify the object necessarily. We have to.

B

Have the second layer which.

A

Is a pooling their the pooling layer represents the object itself, and so, if actually learns to associate a specific representation for popping up with a local feature. Location pairs. I shall probably happen. Consists of these features at the location right I said: there's nothing magic about it, but at any point on each finger only gets partial input, and these finger may not have enough information to identify this right, but they all trying to do it. So.

B

A

The out what we would do long range connections and put layers, let them do this- is settle on an answer. That's consistent with all the inputs. Do it very quickly, so imagine input on Mike one finger says its object, a B or C the input of my other finger says it could be object, a RS and the other one. The third.

B

A

Is this could be object, a zmw tray.

B

And this is because we're doing a spirit here well.

A

What's going to happen is they're going to for me you each a second layer on each of those columns. We can form a union of those three options. So I don't know what it is. It could be angry CI, which it is, it could be, you know a and W so, but this long-range connection essentially has been boat instantaneously very quickly, it's a what's consistent across all of this a so we set on it. We are all just working.

A

This is model Val using our a shim there onto this is not like totally like provocated sure. So this says why you can identify objects very quickly. If you have multiple fingers touching something or Y, you can identify something very very quickly if I'm looking at it with my full web I can just flash on the in front of me very often, I can recognize it because all the individual patches, the retina image, passes the v1 or arch, basically modeling the same object and they're all things.

A

I, don't know: I have a partial knowledge about this. This.

B

Or this, but other one things, this is early AIS and so.

A

So this and then what happens now is that the Z's they oscillator the object layer which is stable right. Imagine on the alkylation cooling, where the cooling layer it's it's saying this is a coffee cup. Every time I move my finger. The output layer is still as stable as this coffee cup, but the insulator is changing every time right.

B

Can look at that voting is continuously varies occurring.

A

But once I know that object unless I know what is accomplished up news is there's no doubt about it, unless one of the inputs could be inconsistent sure. So what happened is if you look at this diagram again, the the coffee cup is projecting back down to the insulator and it's saying, but as a coffee cup, these are all the future location pairs that you might find on a coffee cup. Well,.

B

If the column knows that right, let.

A

Me alkylation: it's a coffee cup and the.

B

Column knows what features might be on yeah.

A

Where this edges facing up a model of expansion, a cooling layer, so the output layer position is associated with me- are 20 different input features future location pairs. That's a definition of the object right and any one of those might be occurring with so which it would projects back and it depolarizes the input layer and says these are the union of all the future location pairs you might find on this object right now.

A

If I know the location, because I'm about to move my finger and so I actually miss I, I told you I know the new location, the new expected location because I'm moving my finger right now and I, the new album interpretation, let's say magically no map yeah, so now I, say: okay, I note, the object is, is just consist of these future location. Pairs. I also know whether the new location I'm going to be on and when you'll end up with a jewel upon the prediction of the interlayer of exactly in put your spectrum right.

A

This is okay. Given the location and the object, I know exactly what I should feel and live. You have your prediction, and these predictions are all through depolarization. Like an engineer.

B

And everywhere these four columns getting a different prediction: different set of cells based on what is experienced on that object.

A

Well, they're, all they all individually learn what coffee cup is right, but.

B

Maybe just part when I'll actually I'm.

A

On going up, okay, all right, if I was doing this all in this region, I can do it all in one. Major, assuming I can do that right on each column, actually, every individual column lunch a complete model beyond this is Alyssa price yeah. This is twisting it around. It's like this.

B

So, even though it doesn't get all this entry, not.

A

A one it doesn't get a lot, but it will get it over time. Oh I.

B

See because one finger could I win.

A

Well, I will sing her eyes over time. I will my finger will touch these.

B

Different now, I can do that with anyone. I think.

A

I know all of my fingers yeah, you think about. We can do when you get the new sitting on your.

B

A

Whatever you like this a lot.

B

Of language yeah exactly exactly.

A

Everyone juice every column, is learning it as much as visit limit. My be calm is trying to learn the entire three-dimensional structure of objects. You're all running in parallel right do all parallel mala Mia, so a region I think if I should have a hundred cortical columns, there's a hundred models of the same objects um and at.

B

Any point I might recognize it with any one of it.

A

I could isolate recognize if I have no country input. I could recognize out. This is one finger. Get the it be feature was unique. If it's not unique, then I have to do it with two multiple touches right. It's generally looking through a straw. I could recognize anything but destroy if it was unique. I hope that's not known. I got, but it is not unique. I have to look at all thing right as you move around. So when you have this choice of balance between, you know how many columns are asked to be touching.

A

The object or sensing object at once, and so on turns out that adjacent columns in the court in the central region, typically we'll be sending the same option, restrict. Oh, yes,.

B

A

There just adjacent my fingers are typically doing the same object, because you know it's not like my right fingers, usually touching when out you can mine mine.

B

Dishing is my building is touching something else, maybe I doing something else like that would be an arm.

A

Now we jinda clash the thing I want them, so this basic this is the basic idea. This is the basic circuit we test with us extensively. Let me just show you a few other wide. You just didn't. Go look at this next figure in us in the and the slide deck here, just to remind you that we're building all of this using HTML sure so, four key right, yeah, we didn't have to reinvent the neuron. These are.

A

These neurons will be modeled as if day, without including apical dendrites, and we didn't really use them very much before, but a feedback, that's the feedback in the upper layer, brain and then the base event choice. It's.

B

Always been in the diagram, yeah, okay, we haven't really these.

A

Models not required yeah, we've done just to show you. This slide called simulations of convergence, timers number column. This is just showing it will lead model. These things we can model. This convergence problem is issued so so that I don't like to detail but be look at the section at the top is the diagram we have. We've created a bunch of virtual objects.

A

These are so simulated object that have feature that location, so it would design those two that there they're similar enough, that you can't distinguish them very easily, have to touch multiple places on the distinction right. They all this similar features, and you know any particular feature you touch on locations, lack of unique and so- and this is a be section here- we show how long it takes for a single column like a single finger. You need to recognize one of these objects, and this is sort of the activity pattern in the output layer over time.

A

So time is on the x-axis is a.

B

Sort of randomly touching.

A

Yeah well I want this Randall better point to them. I actually, don't know how or you a, but we looking at nnd were classes, the number of cells that are active at one time so showing the Union over up with the beauty of objects. So you see the activities a lot in the beginning. I've got just touch once I have a union I, don't know what is a touch again again and again and then at some point it gets down to the after I think it's about seven or eight.

A

On average it says that I know what this is right and then it's that it's locked it yeah. If you do three columns at lunch, you still initially in the right first touch, you might be ambiguous, but there are quickly a lot to compaction, much quicker, just getting more data, well you're, getting multiple he's like you're, touching local public to the object at once and you're voting and collecting there. So you can innovate over time or you can integrate over multiple centuries right. Okay, I.

B

Mean it's a very high level. Is it also sort of explains why you learn faster when you're using base senses.

A

That work exactly yeah exactly you know when the if you've, given a new object- and you haven't seen it before and you want to learn what it looks like we typically will do you hold it in front of me to turn around and look at the punch every part of your v1 every part of v2 is learning simultaneously they're, independently and together simultaneously, what that object is they're. Building who you're building under this of mop right object? No one model, hundreds of them. One thing you might ask yourself: is this really going to work?

A

What the capacities are system like this, so this slide here called simulation results passages there's a lot of assumptions here, but we want to make sure that you know this is real. This could actually work that. How much could the neurons in single column actually learn? It turns out if you have a single column of reasonable dimensions, it would.

B

A

To bet like, in this case, we only assume the 150 many columns. It is much smaller than our typical temporal memory, where we have two thousand or two thousand eight comms. They are we're.

B

Only sending 150.

A

Mini columns, that's kind of consistent with a half millimeter cortical column, and so you have 900 cells in the impolite. It's pretty small yeah and you have 4,000 cells in the output layer and under that kind of realistic assumptions which are smaller than we typically model in HM the world. You, a single column, can learn somewhere between 200 and 300 three-dimensional objects. And if you combine mobile comp together, you get more so you can because they, because no, you can deal with it.

B

But there's always a linear growth, so I got.

A

Everything tied up with it I mean it's just a capacitive. How much you can actually synapse is. You can actually fruitfully employ on a neuron right, so you kind of run into an action. But the point is even if even a single column or very small patch of a century region to learn, 700 objects, it's not ten thousand. We don't need that. We just you, know we're going to still have a higher if you're going to do things in our view. But but you know those are a few of em.

A

Two things are the most important for this viscom seems it seems most often and as a collection they can learn more I. Think.

B

A

Capacity will go up in hierarchical hierarchy because the long-distance connections get longer and the dynamics change a little bit. But but the point of matter is that this actually works. You can literally do costly and complex processing in a single column.