Numenta Numenta Talks, 12 Jan 2018

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Jeff Hawkins - Lessons From The Neocortex For AI - Numenta

Description

Jeffrey Hawkins is the American founder of Palm Computing and Handspring. He has since turned to work on neuroscience full-time, founded the Redwood Center for Theoretical Neuroscience in 2002, founded Numenta in 2005 and published On Intelligence describing his memory-prediction framework theory of the brain.
Recorded At MIT, Dec 15th, 2017

A

I'm very glad to introduce again Jeff or canes I, say again, because he has been one of the long-term supporters of the Center for brains, minds and machines, and our vision of an engineering of intelligence based on the science of intelligence.

A

That is cognitive, science and neuroscience so introduced him at the first time. As a speaker of our then intelligence initiative, seminar, series in 2010 so that's seven years ago, is the founder, as everybody knows, of palma, computing and handspring and, as such, has been a legend in Silicon Valley for quite some time in 2003 was elected as a member of the National Academy of Engineering for the creation of the handheld computing paradigm and the creation of the first commercially successful example of a handheld computing device.

A

He has a deep connection at MIT in its infinite wisdom. The MIT computer science admission office representing I, note the other side of a substrate, not this one rejected Jeff's application to the AI lab and so made it possible for him to invent hundred computers and for us to have iPads and the like. So Jeff wrote a book which is, in the meantime, is a classic book on intelligence. That's 2004 describing his memory prediction framework theory of the brain.

A

He then started to maintain the belief that it's time for computer science to learn from the brain for making computers more similar to the brain, Jeff and I agreed, then on the belief that the time had come for a new attack on the problem of AI and that neuroscience would provide important cues. You note the initiative. This was the intelligence initiative, the precursor of the CBMM. The initiative is exciting.

A

Over the last 40 years, see many intelligence initiatives come and go, but the positioning and thought behind hi square. That was the term intelligence initiative, is the best I've seen. Mit is the ideal location for an initiative like this, and since then, companies such as mobile I and especially deepmind, which were then just tiny startups when they participated in the MIT symposium brains, minds a machine which were organized in 2011.

A

Those companies have achieved a lot of success in AI by using two main algorithms, reinforcement, learning and deep learning, and both both of such algorithms were initially inspired long ago by kirit of science and neuroscience.

A

So because of this, when I'm asked what would be the next breakthrough in AI, of course, I answer that I don't know, but that it is a reasonable bet that it will also come from neuroscience, and it may well come from for looking in more details at the anatomy and function of the layers in each cortical areas, and this is what Jeff would speak about. The title is: have we missed half of what the neocortex does allocentric location as the basis for perception? Please join me in welcoming Jeff oaken's.

B

Thank You Tommy, that's very generous and- and it's nice to be back here, I- do view MIT as really setting the agenda in the field that that I like to participate in and I almost completely forgot about the fact that I had my application for a graduate program here was rejected many years ago. That's good, so I don't hold anything against you guys anyway.

B

So, yes, this is how I talk and I won't explain the other than they'll just jump right into it here, just if I just figure, it's a few words about my company, because it's a bit unusual. The Manta is a small business in Northern California we're really like a private research lab there's 12 people we're almost completely dedicated to New York, critical theory and scientists and engineers. We have a rather ambitious goal, which is the reverse engineer. The New York cortex I'm not embarrassed to say that it's an ambitious goal: it's achievable.

B

We should all be working on it one way or the other, and our our approach is a very detailed biological approach. We want to understand how those the neurons and the circuitry as we see it in the mammalian neocortex, what it does and what its function is if it, when I understand ideas inspired by the brain that can come after you understand how the brain works, so we really stick to the biology we test this empirically with collaborations and experimental labs and via simulation, and that's when I talk about today.

B

We have a second goal which relates to what Tommy just mentioned here, and it's definitely second in our case, which is to enable technology based on cortical Theory, so I'm still a believer that the the way we're altima ly going to get to truly intelligent machines is we're gonna, the fastest path there to understand how the brain works and to that end we have a very active open source community. All of our our stuff is very open all of our source code. You can reproduce all but experiments, and we believe this.

B

Ultimately, this endeavor, whether it's us or other people, will be the basis for machine intelligence, as we will see it in the future. Okay, I just want to be mind, I know everyone's here is a neuroscience, and you all know this, but I just I find it it's a good idea just to review a few basics before I delve into this mammals have a neocortex. Now mammals, don't in the human it's about 70% of the volume of your brain. This is my model. I carry it with me all the time.

B

It's about this big, an area and it's about two and a half millimeters thick and what's most remarkable about the New York cortex is the consistency of the microarchitecture.

B

You see everywhere, you look, it's not a hundred percent consistent, but it's remarkably consistent, and so, instead of focusing on the small differences, we really are focusing on the common elements we see everywhere, and so, although some regions of the cortex do different things, it appears- and this was first proposed by Vernon mal castle many years ago- that cortex is cortex and the way we see in we here in the way we feel the way we do.

B

Language somehow is all based on the same sort of underlying fundamental architecture, which it's just a remarkable thing to think about, but it appears to be true so and random. Around castle was also basically proposed. It says: well, the way to think about the neocortex is just think about one little section of it. That goes through that two and a half millimeters. He called it a column, and he says basically in that calm you're going to have that central function.

B

So the goal is to really understand what a column a single like, perhaps a millimetre square by two and a half millimeters the cortex does, and if you can figure that out, you got most of it figured out. So that's what we're gonna talk about today, a cortical column.

B

Now, if you open up a basic textbook, introduction to neuroscience type of thing, you'll see a picture like this they'll say: oh there's, a bunch of layers in the cortex input arise into layer, four layer, four projects, the layer, 2 3 layer, 2 3, is the output goes to the next region and then layer, 2, 3 projects to layer, 5 and that projects a layer 6. That's how information flows through the cortical columns. It's actually not bad, but it's leaving out quite a bit by my count.

B

Right now we're we deal with the relevant about 12 different cellular layers. Layer 3 is easily divided into 2 layer. 5 is 3 different cell types. These may not be visible layers, it doesn't mean the cells are actually stratified, but their sales, a different Anatomy or from morphology or physiology. It can be uniquely identified. Layer. 6 is a very complicated layer. Has these two layer, 6a and 6b, or so to these very interesting layers, and it's got a bunch of other cells down below there.

B

If you just follow, for example, the same as we did on the left there, the feed-forward circuit II gets complicated, so there are actually two inputs to every cortical calm, especially not the primary ones is. Sometimes you have connections directly from other cortical regions and sometimes I go through the thalamus into there. So there's two sort of feed Pro inputs. They do arrive at layer four among other places, but they only form about 10% of the synapses on layer, four cells, about 50% of the synapses and layer.

B

Four cells are shown in this blue hour through this very kind of unusual bi-directional connection between layer 6a. So if you can understand what layer four is doing, you can't ignore what layer 6a is doing, because it's providing about half the input they're. Indeed, layer, forward, projection layer, 3, that's the quit. Layer goes direct, other cortical regions, but layer, three also projects down the layer.

B

Five- and here you see a very similar type of circuit between layer, 6b and layer, one of the layer fives you have a similar sort of parallel structure going on there with. Is this very, very characteristic. Bi-Directional connection, then, that projects to upper layer five, at least in some species, this upper layer, five, but it's the layer, five thick tufted cells and that becomes a second output of the cortical column, and that is the one that goes through the thalamus.

B

So it's like these two sort of inputs and two outputs and there's this complicated circuit going on between now there's a lot of known about the cortical anomie I'm not going to go through it, but we can summarize a few things here. We can say: cortical columns are complex, they're, very complex least 12 or more excitatory shadow layers. There's two feet forward pathways, there's at least two feedback pathways.

B

You can show them here and there's numerous connections up and down the column in between columns and then, of course, there's an entire inhibitory circuit, which is at least as many cell types and equally complex. So this is a very complex system here um now the function of this thing is also going to be complex. It's not going to be simple, so anybody who says, oh, it's a filter. It's changing this or changing that. That doesn't seem to be the case.

B

We should expect this thing to do a lot and in some sense we're looking at, and this is the thing that makes us think this is the the source of everything. In fact, whatever a column does has to apply to everything the cortex does, because this is the circuitry of the cortex, so we might think about.

B

Oh, how is this kind of touching or I'm going to see with this, but it's also going to explain how we do language and it also has to say something about how we do neuroscience and how we build buildings and so on. So it's it's something really remarkable. Now. I have two thoughts about this before I get into the details. In my talk, one is I just want to remind yourself. This is one of the most important scientific problems of all time.

B

It's worth, stating that it sorts remembering that it sits up there with the discovery of you, know, genetics it's up there, it's really kind of the core of who we are as humanity, and it's the only structure that knows things. This is in the only structure that discovers things and, of course it defines us as a species.

B

So it's a really very thing to work upon now, I'm gonna, I'm gonna I've been working on this problem for a long time and I, like many of you and what we've been doing is we've been totally easing apart pieces of and trying to understand a piece, and then we found another piece and we try to fit those two pieces together and then so on and lately we've had some success in getting those pieces, we started putting it together, really interesting ways and actually, in the last month, less than a month, we discovered in another piece even for right after I set up this talk and all sudden a whole bunch of stuff fit together, really really well, and so I'm gonna tell you about that.

B

It goes beyond the abstract I mentioned today. In the talk, the end of my talk, I'm, going to give you explicit proposals about what many of these layers are doing. I'm gonna be filling in a diagram here explaining what's going on here, at least our hypothesis, for that it won't be everything, but it's going to be an interesting foundation and I'm gonna make the case for that now to do that in the time I have allowed.

B

I have to move quickly through a whole series of concepts, and typically when you give a scientific talk and give me one concept, and you explain how you did it, what didn't work and your experiments blah blah blah I, don't have time for that. I want you to understand that everything I present you here is not just made up. It was a lot of work, a lot of testing a lot of it took a long time and I have a lot of confidence in it, but I can't present the data to explain that.

B

Why I have that confidence, so I just want you to least give me the benefit of the doubt that later, when you ask me, questions I can go into nd detail about this stuff in great detail, but I'm trying to tell a story here today and I want to get to that end pitch picture now. The way I'm going to tell the story is the way we discovered it. It's not the way we went about our work. It may not be the best way, but it's the white.

B

The way I know so I'm gonna start at the beginning, the beginning, all of our work was based on a single observation. um The observation is, the cortex is constantly making predictions of its inputs every time. I feel something I have an expectation. What I'm gonna feel and that expectation is a very detailed prediction, as I move my hand along this lectern, if even the slightest little dip here, I would notice it.

B

We build I catch my attention or, if it felt a little funny if it felt like jello or cold or something so I have that tells me if I notice, changes I must have had an expectation what it's going to be and the same thing as I move my eyes: I'm consoling, predicting what I'm going to see of trying to in a thing with audition you're, constantly trying to predict what I'm going to say or what you're going to hear.

B

So we asked ourselves: the question is okay, our research paradigm has been how do networks of neurons, as seen in the neocortex, learn predictive models of the world. It's not that the cortex is only building doing predictions, but it seems to be a fundamental component of what the cortex doesn't. If we tease apart prediction, we might understand what some of the functional components underlying that are. So that's what we want about now, this question, this research question can be broken into two parts. If you think about the patterns are coming into, the brain.

B

You've got these sensory streams, millions of sensory bits coming into the binging all the time. Why are they changing two fundamental reasons? Either the world itself is changing and I'll call that extrinsic sequences like you're, listening to a melody and you and you're learning the sequence and it's the pattern in time that matters. That's one form. The second form is when you move yourself so and you're doing this constantly. Every time you move your eyes several times a second every time you touch something.

B

Every time you do, you know, walk around the room, there's a flood of changes coming in, and it's been known for a very long time back to Helmholtz that you can't really understand the world in those sensory inputs if you're, not accounting for the behaviors that go with them. So it's the sensory motor sequences that are leading to those and so that's part of problem. So we started with the first one and then we tackled the second one. So, on the first one, we had a paper that came out in March of 2016 called.

B

Why neurons have thousands of synapses a sequence of a theory of sequence memory in your cortex in in the era? The big idea is we suggested that every pyramidal cell is actually a prediction machine and the vast majority, the synapses on the pyramidal cell, are actually used for prediction, I'm going to walk through that, then we showed if you took a cellular layer, like you might say, one of the layers in one cortical column, yeah a network of those metals.

B

We learn a type of sequence, memory, a very powerful sequence, memory, a predictive memory and in order also have the new to do some. Probably super, sparse activations to understand that. So that's in that paper, then we just had a a paper come out in October of this year, called the theory of columns in the New York or checks how the theory of how columns in New York chart your cortex learning the structure of the world in that paper. The big idea is, we deduce it every column every you think of it.

B

We're talking mostly about primary and secondary sensory columns, but ultimately I think it'll be every column. We do just said it must have a sense of an allocentric location and I used. The word ala centric in a very broad term. It just means other I'm, not using it in the terminus, if eclis, as people who study like grid cells, do and something like that, but really you can think of when I say I will say. This is tripping some people up today. You think it was object centric.

B

So when I touch this little clicker here, when my finger feels something I'm arguing that that the column that's receiving the inputs, my finger is also figuring out where it is on this object and we'll get into that. So that was the big idea there and then, as the sensors move through over objects and through the world and learn models or complete objects and I'll walk you through that and then the third part here is our current research, and this has not been published. It's very new.

B

We ask the question: well how how could columns compute this allocentric of eccentric location? We had. We had the idea that well, let's look at grid cells and place cells and because they solve a similar problem, and after we studied this for a while, we we come to believe that cortical columns contain analogs of grid cells and head Direction cells that they're solving the same basic problem is that the internal cortex is using to map environments.

B

It's been served and it's now using to map physical structures objects and it's a very parallel process, and when we've understood that now we're starting to have to understand the function of numerous layers and connections, so I'm going to go through this in order I'm going to very quickly go through these points and end up down here with the specific functions of layers and so I'm going to go pretty quickly. So, let's start with one slide on the pyramidal neuron as a prediction system. There's you hear typical pyramidal neuron.

B

It has thousands of synapses anywhere from five to thirty thousand excitatory synapses. Only 10% or less than 10% typically are proximal. You can actually drive that cell. The fire 90% of them are on either the distal basal dendrites or the apical dendrites, and typically they're completely unable to make the cell fire, which a lot of great research has been done to show that dendrites are active processing elements.

B

So if you have somewhere around 15 active synapses that could come active at relatively close in time and space, so they have to be within like a 40 micron on a dendrite segment, Danny generate it can generate a dendritic spike. The generating spike can go to the soma generally. It does not cause the cell to fire, it depolarizes a cell, so it raises its voltage, but not enough to generate a spike that can be a sustained in polarization hundreds of milliseconds up to a couple of seconds.

B

We are gonna argue that that is a predictive signal, so the proximal sentences. This is our theory. The proximal synapses cause somatic spikes, they define the classification field of the neuron, but the distal synapses cause dendritic spikes and they put the cell into a depolarize state or predictive state. What's the benefit of a cell being depolarized, our models in ours and our network models rely on that fact. What happens is that the poisoner will fire a little bit sooner than another neuron.

B

If they both have the same receptor field, they don't have the same basic feed flow receptor field, the one that's going to be depolarized will generate its first bike, a little bit quicker and it's going to inhibit its neighbors in a very fast in a circuit so, and it turns out, if you typical, a typical pyramidal neuron can recognize hundreds of unique patterns, 100, unique context in which it's printer predict its its input. This is how we model it when we all of our simulations. We use this. This is a picture of our software model.

B

For this thing, I don't base achill e in green there, that's the proximal st. apps is the, and then we have a debate. The basal synapses, we label their context. It's an array of coincidence, detectors and then the apical dendrites are similar. These are like threshold detectors, so this is our model of the neuron. It has multiple states I won't get into it. I also should point out the learning model.

B

Here we rely on synaptogenesis, so we're not changing weights of synapses we're actually growing new synapses in our model, in a very clever way that matches biology, but I'm not going to get into it now one of the properties of sparse activations. We have to cover this because you won't understand anything else inside cover this, and maybe you know this already, but I don't so. Let's take, for example, we have one layer sail, doesn't really matter, we just come.

B

Take a bunch of cells and safe, like one layer are cortical column, let's say it's: five thousand ons and typically what we see is a very sparse activation. So let's say two percent of our neurons are gonna, be active at any point in time. So we have a hundred active neurons. Now at any point in time, is 100 and then one moment later, there's not one hundred moments later is another 100. So first guys, beginning ask is what is the representational capacity of a layer of cells? How many different ways kind of pick?

B

A hundred out of 5,000 well that you all not surprised, it's very, very big. What you may not know you can type this into any browser and just say: 5,000, choose 100 and it'll. Tell you so, and in this case it's 3 times 10 to the 2, that's infinite as far as we're concerned, and we don't to worry about that. We can pick them all day long. The second thing is, if you randomly choose two sets of patterns to activation patterns, what's the likely?

B

What's what's the sort of the distribution of the overlap, how how many cells would they have in common? In this case it's about two, but then you can say well, what's the chance of it's coming out of 10 cell 20 cells or 30 cells in column, in common in the 10th, it turns out that it's very, very unlikely it very quickly drops off to like. Never, even though technically it could be.

B

So you can pick random what we call STRs of sparse activations all day long and they almost all overlap by just a few, so they're, very, very orthogonal. In that sense, now we can take advantage of this, because the neuron, what it means is a neuron, could only have to form a few synapses or doesn't have to form connections or all the cells that are active. It wants to recognize a pattern. So, in this case, I said: I want this neuron to recognize I have a hundred cells active here. These are the gray cells.

B

It only connections on one of its dendrites to ten of those or 20 of those, and it can reliably recognize that pattern technically, it could have a lot of false positives, but it just won't just never going to happen. The second thing we can do now. This is a perhaps something you haven't seen before, but maybe you have is we can answer self the question: what happens if I form a union of patterns?

B

So instead of just invoking one pattern in this layer, cells, I'm, gonna, invoke ten patterns, that's a thousand active cells or twenty percent of the cells being active. Well, you could say wow. This cell is going to mean trunks trouble now because it's still looking or antenna listen and it could have a false positive. But if you do the math, it's still extremely unlikely.

B

So this cell, by connecting to 20 synapses in the whole population here, can reliably pick out that pattern, even though it's a whole bunch of other patterns going on and you can keep, you can do unions much greater than that. We're going to rely on this pattern, this this property, because what we think is going on in every cellular layer.

B

The column is representing things and often there's uncertainty and when there's uncertainty, it's going to use a union and it's gonna say well I, don't know it could be X, Y, Z, Z or so on, and what it means that the networks don't get confused as it tries to resolve that uncertainty. Is they bounce back and forth they're going to essentially narrow down to the only consistent answer under all explain some of this.

B

But the point is, we think unions are happening everywhere, and so the density of cell activity basically represents uncertainty, and when you really got something you know what's going on, it's gonna be very sparse. Okay, then we said: okay, take a bunch of those parameter neurons and for sparse activation, put the put them in a layer like this, and we had a few more things. We're gonna basically define we're. Gonna put cells into mini-com, so you might say 10 cells per mini column and what the mini-com it doesn't have to be a physical structure.

B

What we're all we're asking is that the cells in a mini column have a same a common feed-forward receptive field property. This is why the classic kugel and bezel just any many years ago. Oh all the cells in sewed vertically, might have some sort of acceptor field property. You don't have to see the many columns you just have to have that property.

B

You add that cells up to those cells in many kind of are gonna, respond to the same feed-forward pattern, but they're gonna form, connections horizontally that are unique and, and so here's what it would happen in two time periods, time 0 and time 1. If I had no predictive state and an input comes in, it's going to activate all the cells in the mini-com because they're all equally getting this thing and they look similar in the condition where there is a predicted input, as a predicted state and I represented those by little red circles.

B

Here this means if these cells predicting they're going to be active, they're depolarized, the same input comes in, but it's going to select. One of those cells is the one that is that was predicted to get a fire first. They were very fast inhibition and basic Informer Sparsit pattern, the next moment after this. What will happen is the active patterns will then predict another cells, and so you can go through these sparse activations in time prediction and activation predicting activation and that's the basis of sequence memory.

B

We have built this for years and we tested this and we apply to commercial that we understand it very well. I'll just mention a few things. It's very high capacity- and this is important to remember you- can a slightly bigger Network than this we've shown can learn up to a million transitions, meaning it's like 10,000 songs of 100, no T's, it's really high capacity. It's surprising!

B

They can learn high order sequences. So imagine you the treatment, training out two sequences ABCD and xB CY. If you show it ABC a padeen, you show it xB see it predicts why it doesn't get confused by the B and the C. Similarly, if I just show it the B and the C, it's going to predict both D and Y, because that's all it can do at that point in time. It does all these things automatically. It's extremely robust to noise and failure.

B

You cannot got 40% of anything and it still performs well and very desirable learning properties. It's very. It's all local learning, very simple rules. I won't get into all of that and solves many biological constraints. This is, there are many people implementing this by now and it's being used in some commercial applications, but it is a biological model, first and foremost. Okay, we down to the first section now, the second section we asked: how are we going to do? Learn predictive models, essentially motor sequences?

B

Our first idea was say: ok, let's start with the same cellular layer and can we turn it into center field model, and we said well, here's a basic idea: what if we just added a motor related context, so, instead of the context just being in the previous state, we have a motor related context and we were inspired because we said look. We know that 50% of the inputs to the layer, 4 cells come from layer, 6, hey, so that's an idea.

B

Let's go for that and we answer myself well, what would that motor related contact would be and well this is the hypothesis you know by adding a motor, really context of cellular layer, compare its input as the sentence and and then we said what is the correct motor related context. We started working on this several years ago. We try different things and they kind of work, but they didn't work really well. They didn't scale well and so on, but about just a little bit under two years ago.

B

We had a little about it and this gets to that allocentric, so they may use my my coffee cup is my prop. I'm gonna use this a lot during this talk, um so you can just basically ask yourself a very simple question. Imagine about looking at this coffee couple I'm just touching it I'm familiar with it.

B

This is my coffee cup from my office and I'm, holding in my hand, I'm about to move my finger and I can can I predict we're gonna feel yes, I can't I, know I'm gonna feel I'm gonna feel this edge here. I also know if I touch down here. I'm gonna get this little rough thing here, because this cup has a rough bottom. It also has this little little doodad here so I like touch my finger, I I make the predictions before I touch it. I know I'm gonna feel now.

B

How could I know what I have to know? First of all, the cortex has to know that this is a cup. You know it has to know it, and it has to know where it's going to touch. The cup has to know that, if I'm going to predict on the field, it must know where, and that thing that's going to know is where, on the cup it's going to touch, it's not relative to my body. It's relative to the cup I need to know the allocentric location of those like possibly make that prediction.

B

That's deduction and the predictions are going to be a fairly fine granular level. Every part of my skin touching this cup is predicting what it's going to feel and that's a lot of them. It's not like some global predictions is a very local prediction, so we realize that that is a requirement and that's where this for the allocentric location comes from. Okay, so I answer now is hey. Let's take if we have an allocentric location, the location of the cup, and how could we derive that I didn't know? What does it look like?

B

We didn't know, we'd assumed we had some weight in the beginning. We just did experiments where we were sort of randomly made-up stuff and, and we also realized- we really wanted a second layer to the network. The second layer was what you were typically called: a pooling layer. That's a term a lot of people use if you don't know what it means in this case, what I mean by it is the second layer we're going to essentially pick a sparse activation of cells up there and it's going to stay constant.

B

While the lower layers changing upper layer, those cells up there are going to learn to respond to the series of independent, individual, sparse activations in the lower layers. So, if you think about the lower layer, it's sort of representing at a feature, the sensory feature at a location and if you've- some, if you basically, you were basically modeling an object as a set of features at locations, it's kind of like a CAD file. Well, it kind of makes sense. That's what else could you do?

B

Modeling an object and in what's interesting here, is that the output layer, this this object layer, is going to be stable over movements of the sensor and the input layer will be changing with each movements of the sensor. You have a stable representation of the object as you move, and it doesn't matter which order you move, how you touch the object long as you as long as you know the allocentric location that magic signal.

B

We don't know how to do that yet, but that's the magnitude, so we modeled this, and we did a lot of work with this. So with an Allen central location, input, a calm can learn models or complete objects, or this two layer network cam and by using essentially different object locations on the object over time. So two integration time, you can both learn model objects and you can infer I'll, show you that now.

B

The next thing we realize is, if you had a series of columns near each other, imagine they were representing three tips of your finger and it's going to touch that coffee cup three fingers at a time. Well, each finger. We're gonna have its own location on the object. Each finger is gonna, have its own sensory input. What those are unique, but they're all going to be basically trying to model the same object and if they're confused, they may not know what the object is, but the output layer.

B

These are going to be three because going to be basically representing the same thing. And so, if you formed an associative link between on the air, they can vote together and they can help resolve ambiguity. That's the basic idea, so each column has partial knowledge of an object as its sensory equivalent. Sensory thing is moving, and these long made language connections in the objects layer allow the comms to vote and inference will be much faster when you're, using multiple columns than with one column.

B

Just like asked for me to reach them to a dark box and if I use one finger to figure out what I'm talking about if I grab with my hand, I'll get it or if I was looking at the world through a straw. I'd have to move my straw around a bit, but if I open my eyes, I see the whole thing, then I can do it very quickly. So there's this is just a little cartoon animation just to illustrate some of this. It's it's not terribly accurate, sis, Willis tration purposes.

B

So imagine this finger is going to touch this Cup in three locations and I have one column with his an input layer and an upper layer as I move towards this spot. I'm going to touch I have a predicted location signal that it basically invokes a union of possible sensations. I might find at that location when I actually touch it. It had a sensory feature that comes in it selects one of those sensations it projects up to the output layer, and this thing says: I know three objects. It meets this.

B

The coffee, cop, the can and the tennis ball all meet that. So our forming union representation up there then I go to the new location, I get a new location, but it basically makes fiction about what it might sense. You actually get a proper census li, but this feature at this location I, pass it up. The output layer and I eliminate the tennis ball, because that's inconsistent with feeling a lip or an edge and then I go to the final sensation here.

B

New location, new sensory feature pass it up and I can eliminate the code kambei or the soda can, because it's inconsistent. If I do this with three fingers at the same time, like the hand grasped it, I get three different locations. Three different features in this case, we're showing them the same. They pass it up in the output layer. We can say oh well column, one says it could be a coffee cup or a ball, the other ones who are saying it could be the coffee cup or the.

B

Can you just quickly associate with each other and you illuminate and you're down to it. The only thing that's possible for the three of them is the coffee cup. So very quickly. You lose that we tried this out then on a more sophisticated problem. We started with this Yale I could see mu, Berkeley benchmark, which is about 80 objects, they'll actually send them to you hunt or you can just use the 3d CAD file. So we figured since some of them are perishable food items.

B

We would go for the 3d CAD files and then we built a robotic simulated virtual hand using Unity game engine. We built sensory arrays on each of the fingers saying and we built a multi-column array representing each finger. We use 4096 neurons per layer per column, so if it's three fingers that we've got 24,000 neurons each with thousands of synapses and not surprising because there's a simulation, it worked very well, but I just a few things.

B

It's just the talk about here mountain we did it with one finger and the one finger is touching it at different places. In one touch, you can't really tell what the object is. So this is the confusion matrix, which is what the actual object is on this side, and the vertical option is what it actually, what just thought it might have been, and you can see this.

B

Obviously the right answer is the diagonal, but in this case there's a lot of confusion and after the second touch things started narrowing down quite a bit after six touches you ruined really really well and after ten touches Jory Caron tree to get it there's a lot of variability this, because if you touch sort of unique features on the object, you can narrow it down quicker than if you touch non unique features. But this gives you the general idea.

B

We also do a lot of experiments looking at basically the number of columns or the number, if you want to think about, is fingers. But you know we can do this abstract Li and, of course, what we'd expect is that you know the fewer columns are using the more touches you have to earn the more sensations you have to have to recognize the staying, and, if you have more, then it quickly settles down to basically can do it in one sensation and it gets harder depending on some other parameters.

B

Like there's a lot of parameters, you can make this harder or easier, but the point is it kind of we showed that sort of characteristic, all right, so that was that big idea there and but then we really said. Okay, we got to get to the heart of this allocentric location thing what's going on there? What does that mean? And and as I said, we we thought of that- we said: let's go look at the the the in Toronto cortex to see.

B

What's going on there now I know, there's a bunch of hippocampal people here, and we were talking about this this morning. There's various reasons why we chose tomorrow and Toronto cortex so think about it on an internal conjugate into it, but don't get mad at me. If I don't talk to your favorite topic, so so we ended up here. This wasn't our initial hypothesis. We are usually pastas. This is a coracle columns that could contain analogues to grid cells and very recently we realized they had to have analogs the head Direction cells.

B

That was the last missing piece that I didn't know about until just a few weeks ago. So let's just talk about what goes on in the internal cortex and I won't come to be a an expert in this, but we have run this by some experts and they said it's okay, you can say this Jeff, so we're in Tirana cortex is one of the things that does is.

B

It allows an animal typically, we study rats to basically build maps of its environment, to know where it is and may be able to make predictions and know where things are sort of the foundation of other sort of navigation problems and grid cells. I won't go into all the details. If you know we all know about them, but some of the details are really important, but I won't get into them. They allow us in code location. So the way to think about this. If you look at rooms they're, actually the same shape.

B

I should go back here, they're the same shape but they're they're different in some salient feature, and so the rat perceives them as different rooms, and you would too, if you were in there, and what we want to do is is to have a representation of where the location in those rooms are now the way grid cells.

B

Do this thing and I'll just say if you think first of every port in this room can't be associated with the sparse activation of the grid cells, so you have a bunch of grid cells them in these grits all modules, but if you just looked at which cells are active and which cells are not active, it's sort of a sparse representation and I've shown you here. Three locations in these rooms, every location in these rooms has an Associated pattern.

B

What's interesting is the locations in the room are unique to the room, so the actual coding of these locations in room one will be very different than the coding in room two. This is actually essential to the whole theory, so it a means that location in that room, that's a sparse activation in X means that location, room and R means that location in that room.

B

That is a very different things and, of course, one of the most important things here is that this location is updated by movement, so even in the complete dark, if the rat is in that room and it moves and you walk forward, it updates its location, information and one of the clever things is its raishin properties. I want to go from here to there. I can go this way and then turn this way and I'll get the same representation.

B

If I just went straight or went around the circle and what's clever about this, it works even in novel environments that it's never been in before. So it may be, never been in room three, but it'll have that path: integration, property there, even in the dark, so that's kind of clever. Now the the rat needs to know. Are you its to do this in the dark yourself I? Do it at night it actually just fun to try to try to see what, how good you are, and what your?

B

What your, how good you at this you are. You need to know the orientation of, in this case the animal's head to the room, and so there's these things called head. Direction cells. These are not driven by you know, magnetic fields or something like that. They are basically set of cells which indicate the direction of the head. The anchoring of those head Direction cells is unique per room, so it doesn't really the room the room, there's! No, you know it's not always aligned along an edge, but with always consistent and the up.

B

The orientation is also updated by movement, so think of it. Yeah. Why you need this? You need this first of all, you're going to need to know the orientation of the head direction if you're going to know where you're going to end up after you move.

B

So if I walk forward two steps well, it depends which way I was facing, where I'm going to be also if I know, if I want to predict what I'm going to see or since I have to know where I am in which direction I'm facing cuz I could be in the same location here as the animal moves. Both of these are updated. Simultaneous you have to update the to the orientation I'm going to use the word orientation, because I'm trying to generalize it orientation and the location both get updated, I might be updating.

B

Just one orientation or I might be just ordered.

B

Updating their my location or I might be doing both as I move around in a curve like that, so location and orientation of both necessary to learn the structure, rooms and predict sensory input in that case, so we think the same thing is going on in as the quarter comm is trying to model external objects in the world, you can define a location associated with individual objects, so my coffee cup is like a room, and the points on are going to be both unique to the to the coffee cup and unique to the location of the coffee cup and the same thing with the pen and it's gonna have to be updated by movement.

B

In this case, the movement is in the case of my finger is the the movement of my finger relative to the cup, and, and so we have to have that. The second thing and I only realize this recently, you also to solve the problems of modeling of objects and modeling of structures. You need to have a good equivalent of an orientation so I've trying to show here, as so total sensory on your tip of your finger. Both at sensing point a but from different orientations, so you can look at it.

B

This way, I'm touching the lip of this Cup and, as I rotate my finger like this. The sensation of my finger is changing, but the location I'm sensing on the cup is not hang is a feature of the cup I'm, not actually sensing the feature. I'm sensing, the feature at an orientation I can't say that the feature is actually the slip of this cup in the frame of the cup, but the sensation I get changes as I move the orientation of my finger relative to the object.

B

So we need to have something like that to of the sensor patch to the object now, I should state now that I'm gonna give this whole theory in terms of touch. But the whole thing applies, the vision and I believe it applies to audition as well. It's a little harder to think about that, but there's nothing we're not doing anything specific here. We're really trying to talk about generic properties of sensor patches relative to things anyway.

B

We're gonna argue that this is anchored to the object in any way that it is over there and this orientation has to be updated in my movement. So our basic idea is the following location and orientation of both necessary, that is location orientation. My sensor patch, whether it's a core part of my retina. Where is it it's?

B

Where it's sensing, not where the sensor is where's you you both necessary to learn the structure of objects and to predict sensory input and to infer I view this as a deduce requirement and and therefore I, don't feel it's speculative. But you may not agree with that.

B

So now, with this knowledge, we went back and we did the following. Okay, we started putting these pieces together in ways that that are interesting and I, and this is where I'm going to lay out these sort of basic of the theory here, and this was my most complex slide. So if I lose you here, sorry, but well, I'll bring you back in a moment, hopefully and I think everyone who's really smart about figuring this out.

B

You're, probably ahead of me already I'm, just gonna upfront, say without any further justification that layer, 6a is representing orientation of the sensory patch and layer. 6P is representing a location, there's reasons for this: I'll get it in a sec. These are both going to be motor, updated. There are going to be path, integration, type of- and it's sort of, like it's grid cell like and head Direction cells and they're, going to have properties similar to those cells in anti rhino cortex. Now, let's follow the circuit.

B

She has information and your basic feed-forward pathway here you've got a sensation which is arriving too late for and that's paired with. This bi-directional connection is very characteristic connection between layer, six a.m. and and layer four, and what I'm gonna argue there is that layer, four is representing a sensation in at an orientation.

B

Now again, if I didn't know the orientation I just have a bunch of cells that look like you know, edge detectors or something like that, but in the context of an orientation, I'll get a spark pattern and it's a sparse pattern that represents sensation at an orientation. This is our sequence, memory layer that I started with it can learn sequences, but it can also learn sensory motor sequences and so forms this unique representation of sensation, our at orientation. Now the next layer is going to be a pooling layer.

B

Imagine if I were pulling the input as I rotate at the same location like this. What just you know it takes a while to think this in your head.

B

Well, you end up with is a stable representation of the underlying feature independent of the orientation of the sensor, so I would end up with reputation of whatever the thing is: I'm I'm, actually sensing at that point, independent of whether this way, this way this way, if I went through that motion, that's what would happen in layer and this represents the feature that is being sensed at that point. At the moment, there's no concept of object: I'm, not I'm, not locating this object, I'm, just representing what I'm sensing or, if I finger layer.

B

Three then projects to layer five as we, so that's a classic projection layer and we're going to repeat this same circuit, we're going to have the location, information, projecting the layer, 5b and that's going to now represent a feat- and this is another sequence memory. This is now good. Now we really have the feature at location. Our earlier experiments didn't do with us right and they had some problems, but now because I've added the second thing up above now, I really am locating the feature location.

B

This feature at locations of very presentation is independent of the orientation of my sensor and, if I pull over that in the upper layer here which I'm labeling layer 5a, which really would be the layer, five thick tufted cells and some species that's above below- but just pretend it's this one above here that pooling layer would then be stable over objects. It would be actual object.

B

So we have this sort of two-stage sensorimotor inference engine. Now, if you think about earlier I talked about you could share, you could share information between columns. The only two things that are fearing here are the object layer in the feature layer. Those are two things that neighboring comms might also be doing in common. Everything else in here should not be projecting the other columns, because it's unique to this column and sure enough.

B

The two primary output layers of a cortical column are always identified as layer, 3 and layer, 5, thick tufted cells and those basically represent the feature that you're sensing independent of the object and the object that you're sensing now those actually can be shared to multiple columns and those become the feed-forward input to the next, the next regions. It's worth noting that a column allow this now I've kept it. The second point, a column, therefore, is a two-stage sensory motor model for learning and firing structure. This is just reduced properties.

B

The thing about touching and it's from important, remember a column usually cannot infer either the feature or the object with a single sensation. It's just not going to be possible. You have two choices: you can take the single column and you can integrate over time by sensing moving sensing moving sensing moving or your eyes could be looking at us through a straw and sense so essential mode or you can vote with neighboring columns and both of those strategies are employed in the brain.

B

The cone to be trained has to move over the object, but the column to infer can rely on with its neighbors as they said earlier.

B

This system is most obvious for touch because, like it's easy to think about these columns as being the separate sensory patches that are moving, didn't leave each other, but it also applies to vision, fairly straightforwardly and we'd, be suggested that other sensory modality work in the same way, we spent some time earlier this week trying to map these onto whisking in mice, I think that can be done and, of course, as we said at the beginning of this talk because there's architecture structure, if this is any truth to this say if there is this, this architecture is just about the cortex so to suggest that we infer and learn and manipulate abstract concepts same way, the same way that we manipulate objects in the world.

B

So the theory is the the the evolution discovered a way of navigating and knowing mapping out environment had to do this a long time ago, because all animals move and they have to figure out where they are and how to get home, and then there's another theory that it's been published, that the that the anti rhino court exercise of this three layer structure in two parts and I forget the scientists who proposed this initially, but they proposed that the neocortex is actually now. It was formed by folding those two halves on top of one another.

B

They took a six layer structure, so we think, what's basically happened. Is evolution preserved months of what's going on in the n-terminal cortex, not exactly it's there's differences, but it preserved that and now it's learning how to model, learn how to model objects in the world and in the human brain. What happens? It's now continue that and it's using that same mechanism to model ourselves, and so it would suggest of that.

B

That just suggested that, when we think about things of whether there's mathematics or physics and brains or neuroscience or politics or whatever we're gonna be using a similar type of thing and what's interesting about this is- is this space? Is this idea of location and orientation, they're dimensionless, they're defined by behavior and they're down they're, not metric? It's not like XY and Z.

B

There's sort of this a very unusual way of representing these things and if behaviors weren't physical behaviors, but more mental behaviors like bike mathematical transforms or something like, you could apply, behaviors to abstract spaces, and it's this might be the the the core of high-level thought. Okay, I want to have one more thing here: it's it suggests that we might want to rethink some thoughts about hierarchy that we've all had for a long long time. This is a cartoon drawing, but it captures some of the basic essence of it.

B

We think about sense, arriving at a primary central region, labeled region, one here we extract some simple features and then we converge on till the next region. We extract some complex features and then we somewhere up the hierarchy, we we actually start representing objects in their entirety. This proposal, I, have today, is quite different. It says that every region has columns every column is actually learning complete. Models of the world. Very I mean I'm, not joking. A single can learn. Thousands of things and and I've only talked about what six of the labors do.

B

There's a lot more to be done, but the idea that these things are actually very powerful, modeling things you have them. You have a huge array of basically models and they're all bottling the same stuff in the world. Now a couple of things here: I'm, not I, I, want to make really clear, I'm, not saying that the classic view was wrong. I'm, adding some new thoughts to it that that we hadn't really thought about before one is you what's the difference between all these columns?

B

Well, on things about the cortex, when we talk about how regions project to each other, they never do it that way, they always project, at least to at least three regions above it's like if the LGN is projecting the v1. It also predicts the v2 and v4 and people think yeah, but the connections aren't really strong like well. They might be diverging.

B

The point is: there's nothing that requires here a strict hierarchy, and so you know a secondary region could be looking over the same sensor array but at a wider area, and why would it be doing that? Imagine I'm we're going to recognize the letter, E and and I can do this. I'm gonna argue that I can do that in v1. I'm calling every column in my can recognize the letter E, and if that II was really really small right, the edge of my my abilities.

B

It's only going to be recognizable in v1 because the other regions- it's just it just- doesn't exist. It's too it's too fuzzy, but if it gets a little bit bigger, then it might be recognized by the columns in both v1 and v2. But if it gets really big, then the can't do that anymore. It's just too big an area. I can't move over that, and so you could be representing things at different scales here, but the complete object in the end they're sort of overlapping.

B

Now what if I had to sensory arrays going on at the same time, so I have now a vision in a touch array and we're going to basically grasp the the cup and see the cup at the same time. Well, you would be invoking models of the cup in many cortical columns because it would be comms on the retina they're sensing, the cup and those columns in the somatosensory regions are sensing. The cup, and so multiple columns are trying to infer that this is a cup. They all have models of the cup.

B

Some are drive, visually someone's arrived tactically, but they all model now, interestingly, if they always have models, the cops and they're all sensing similar features, it's possible that they can vote in various ways. Here you know one of the things we see in the cortex there's a lot of projections which don't make sense in a hierarchical fashion. You see you see, projections from s2 going to v2. Well, that doesn't make sense in a hierarchical fashion, and here they can be voting on cups. They can be. The object is being voting on features.

B

They can go up and down the hierarchy. You know across the callosum so and it's interesting you can print. You can form very long as you go to the right layers. You can form very sparse connections to different parts of the brain and it works. You don't have to have a lot of connections to each column. You could just send one connection through over here. It's it's kind of odd, the way it works, but anyway you can have these all these connections both help vote, so the auditory, czar.

B

Being that the tactile system will be helping the vision system. The bridge has been up to the the somatosensory system, so little non hierarchical connections allow coms to vote on shared elements such as objects and feet, and that's kind of the thing we see up here. Ok, so I'm almost done the summary of the talk is we started with our goal. We should understand the function operation, the language circuits in the neocortex. Our methodology of study is to study how cortical columns make predictions of their inputs.

B

We then propose the pyramidal neuron model, which is basically the prediction we say. Every pyramidal neuron is basically using 90% of its 10 APS's for prediction in each neuron predicts its activity and hundreds of context, and that prediction is manifest. As a depolarization, we then said a single layer of neurons forms a predictive memory of high order sequences. This has been well-documented as long as you have sparse, activations many columns fast inhibition and lateral connections.

B

That can be learned when we said we defined a two layer network which forms a predictive memory, sensory motor sequences, if I have some motor drive context context in a pooling layer and of course we propose next that that that motor drive context of an ala, centric location, object, sensor, location and therefore, and then we then we further went beyond that to say: okay, accordin, how equivalent to location and orientation of the sense of relative of the object and those are analogous to grid in head Direction cells, and this begins to define a framework for a car calm.

B

It's certainly not, but it's a potential framework would tie a bunch of things together. That kind of makes sense, columns, learn models of object as features out locations using a two-stage sensorimotor inference model and and I went through the details. There matter a lot, but that's the basic idea and and then there's some total. This is the neocortex contains thousands of parallel models that are all modeling the world.

B

Surprisingly, in high capacity that resolve uncertainty by associative linking and our movements of the sensors there's a couple things that that I should point out that we didn't do very big ones. Objects have behaviors now I point out that everything I've talked about so far is really about the. What pathway we haven't been talking about the whole cortex we'd be talking about how how the what path our model structure and so on, and if I want to talk about behaviors in the web paper on tape.

B

What pathway I'm talking about behaviorism objects themselves, so my laptop has a behavior. The lid can open and shut and I know that also, if I touch keys, they move. I know that this thing has behaviors too. If I push this button, something happens, objects have their own set of behaviors. We have to add that into this model because the the it's not just the shape of an object, it can change and the way that we I think we're gonna model behaviors.

B

If you think about the model of objects or features at locations, those features can move in the object space. That would happen if I'm opening a laptop lid or the features can change at the particular location. So if I bring up my cell phone and it's on and I touch something on the screen, new features appear at the same locations that they bear before. So the polar modeling behaviors of objects is how features move and change at locations. We have to do that. We haven't done that.

B

Yet we need a detailed model of the hierarchy, including the thalamus I. Didn't talk about the thymus. We spend a lot of time today. Talking about the samples we have hypothesis is what it's doing, why we need it, but we have to finish that out and I also already mentioned so to build the complimentary aware pathway. This is not a model we haven't described anything about how we generate behaviors and why I might move and how I we reach something I'm talking about that at all, I've just talked about.

B

How would a what pathway column learn the structure of objects? Aluna I want to put it on a plug here collaborations. There are many testable predictions in this model. In some sense, a Greenfield because the people were pretty proposing that cortical columns, even primary ones, they're doing a hell of a lot more than most people.

B

Think, and so we spend a lot of time this week talking to bearish labs about how we could do that, and we welcome that we'll, have discussions and could talk on the phone or here today and so on, and we're always interested in hosting visiting scholars and interns. We have a couple right now, and so, if you want to come, spend some time in sunny California even for a short period of time, so we have people come just for a couple days and want to get immersed in what we do.

B

We like having visitors like that this is the team we have on the left. There's 12 people I want to call out to specifically Thai Ahmed, who is with me right here. He's bi he's been with me: we've been partners for 12 years and he's critical to the whole thing and Marcus Lewis is one of our scientists and he really helped understand the interaction between layer, 4 and layer, 6 and layer. Five in the f6b I didn't really talk about his work here, but it's subtle underlying everything we're doing, and here's some insights into that.

B

So hopefully I didn't speak too quickly. There. That's the end of my talk. Thank you.