Numenta Numenta Talks, 29 Oct 2014

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Principles of Hierarchical Temporal Memory (HTM): Foundations of Machine Intelligence

Description

"Principles of Hierarchical Temporal Memory (HTM): Foundations of Machine Intelligence"

The Q & A Session that followed this presentation can be found here: https://youtu.be/EU2Vm-VlfEk

Jeff Hawkins, Co-Founder, Numenta

Numenta Workshop Oct 2014 Redwood City CA

A

I'm going to talk about the the theories we're working on and where they're going and give you sort of a pretty depth, in-depth, inter review of, what's happened and what's going to happen, and I gave a subtitle to this talk of foundations for machine intelligence, because I think that's what we're working on we're working on foundation, technologies that will be applicable for many many years in the future, and we take a long-term view of that.

A

So, let's just get uh right into it. At nomento, we have two basic missions where mission things we're trying to accomplish. The first is discover the operating principles of the neocortex.

A

Our goal is not to understand a particular neurocortex, we're not trying to recreate a human near cortex or a cat near cortex.

A

We're trying to understand the principles by which they all work and our goal is not to produce humanoid like robots. It's really to get at fundamental information processing principles that our brains use and that make us intelligence.

A

The second thing is, we want to take that knowledge and turn it into technology that can be applied, and this is the practical side of this. So it's sort of a science side and a practical side. Now it's um it's surprising that, but not everyone thinks this is a good idea, and I mean that not everyone thinks this is the right way to go to build intelligent machines or build intelligent technology.

A

um So I want to just give you a little bit of motivation why we think understanding how the cortex works uh is really important and by the way I should just point out. The cortex, if you're not familiar with, is about 75 of the volume of your brain. It's where all high level thought occurs and when I talk about the cortex, I'm actually talking about a few other structures with it, but I'm not going to get into that level detail today, but we're talking about part of the brain about three quarters of it.

A

Okay, so let's talk about. Why should we study brands to do this? Why should we study the neocortex to do this? Why would machine intelligence be based on cortical principles? A couple of things you may not be aware of you might be. If you follow this field much, the cortex uses a very common algorithm for almost everything it does.

A

So you think about vision and hearing and touches and and behaviors very different types of things, but there's an unbelievable amount of evidence that says these are actually all manifestations of the same problem, and this was first pointed out 35 years ago uh and and it's kind of a hard thing to believe it's one of those beautiful things, and so uh we it changes the way you think about the problem.

A

When you start thinking like okay, you know how what how these things are common, but it's a it's a common learning algorithm and that suggests that we can understand that common learning algorithm. We can apply it to lots of different things. Now it turns out that our brains, human brains, are particularly good at a lot of things, we're amazingly adaptable.

A

We have languages, we have science, we have arts and engineering all the things we do. These are all a product of the neocortex and we do not have separate areas of separate things to do these. It's still the same algorithm. So these core algorithms, which were originally evolved to understanding low-level sensory data and building a model of the world, can be applied to very, very deep problems that we deal with what we think about as intelligence in our in our species.

A

And finally, this is not to say that the cortex always has the best solution to any particular problem, but it's extremely adaptable and it's the most flexible solution. So we believe in the end, what's really going to drive uh the world and the technologies towards a common set of foundation.

A

Principles for machine intelligence are network effects things we've seen in the past in other areas, so people are going to want to work on the most flexible solutions and we're going to the most resources are going to be put into that, so we're going to be naturally moving towards a more universal solution and there's nothing more universal than our brain, because we know that nothing else, that's even close to it. So these are the sort of motivations we use for studying the cortex and as an example for us.

A

Okay, here's my uh the agenda for my talk. I'm going to start with some cortical facts uh things we know about the brain, I'm going to then go into cortical theory or hierarchical temporal memory. The hdm theory I'm going to give you a research roadmap. Tell you what we've done where we're going what's next? What's after that, I'm going to give you an applications roadmap, what kind of things we can build today in the future and then I'll end with a few thoughts on machine intelligence.

A

Now, as as craig mentioned in the introduction, uh some of this is going to get pretty deep. So I starts out easy and it gets pretty deep and then it gets easy again, as craig mentioned, you don't really need to worry about it. If um you don't need to understand all the stuff that I'm going to talk about uh it's and and and you can sort of just try to if it gets a little hard, you can sort of pick up the pieces you can.

A

um But the point is: is that it's important to understand, there's some depth to this and when you hear the stuff, even if you're hearing it for the first time, you'll get a sense for what makes this technology unique and different than other things you might have thought about before.

A

So it's kind of important to do that, it's a little bit like. If you want to understand how a computer works. You know if you want to use a computer, you don't really need to know all the details of what you know, how cache memory works and stack pointers and all this kind of stuff. But someone had to know that initially we're going to get kind of that kind of level. You don't really need to understand all this to use this technology, but it's helpful to have this as a background.

A

Okay, let's just jump right into it. uh So, let's just start just a very high level of what the cortex does it's an organ of memory, it it learns and it interfaces to the world through a bunch of sensory organs. So we all know about the retina and the cochlea and the somatic senses there's quite a few sensory systems. You have and the interesting thing about.

A

It, though, is once you come outside of that, once you leave the retina or you leave the cochlea, it's just patterns of action, potentials or firings on nerve fibers and those nerve fibers are identical matter which what they represent. So there's no difference between a pattern. That's coming in from the optic nerve, there is from the somatic sensory nerves um and it's the brain does not really deal with light and sound and touch the brain, especially the cortex.

A

Basically deals with patterns and the reason the world seems different, like vision, seems different than hearing is because of how the model that the brain makes from this. So the cortex is a it takes in this fast changing sensory data. It's changing all the time. Think about my speech. It's changing the order of milliseconds, it's flowing into your brain right now, and it builds a model of the world. uh It builds a predictive model and that predictive model is basically says.

A

I can constantly make predictions what's going to happen next, it can detect when things are changed unusual and it can generate actions now actions. Of course, your behavior and it's interesting when you think about the uh behavior.

A

Most of the changes that are occurring on your sensory organs are coming from your own behavior, not from the world itself. For example, most of the changes are occurring on your eyes right now or because you're moving your eyes and you're moving them several times a second you're not aware of it, but it's constantly changing. You have a very fast changing data stream coming because you move your eyes as you walk through a building.

A

As you turn as you touch things, it's all about how you interact with the world, so most of the changes are coming are from your own sensory of your own activity, your own behavior. So we say the cortex builds or learns a sensory motor model of the world. It learns how the world behaves largely when we interact with it, but also how it behaves on its own. So this is what the goal of our system here is to build. A sensory motor model of the world and from that we can generate behaviors.

A

Okay, let's just jump into some real cortical facts. Here's a little picture of a you know, cortex! uh Next to it I show a rack neocortex just to let you know that I'm talking about the same thing. It doesn't really matter what species we're talking about. If it's a mammal it has the neocortex and the properties I'm going to talk about now are universal across species.

A

um The human neocortex and all neocortex is a thin sheet of cells. It's about two and a half millimeters thick. uh It's uh I used to always carry around a little different napkin.

A

To give you a sense for what the uh this is a good model for human, near cortex, it's about the right size and about the right thickness and maybe somewhere around 60 billion neurons in the sheet- and this is what's in your head and it's what in my head right now and this sheet of cells is listening in your head and mind- is generating speech now what's interesting about it? Is it's remarkably uniform?

A

um You can find differences here and there, but it's incredibly uniform both anatomically, so you can look at different species and different areas in your cortex. It looks virtually the same and it's functionally uh very, very uniform, meaning you can literally, and people have done this experiment, take an optic nerve and an auditory nerve and switch them on an animal and the part of the cortex that was auditory becomes up. You know, visual and the visual becomes auditory.

A

um If you delve down next level of structure, you'll see that it's organized as a hierarchy. Now, why is it organized in a hierarchy? Because, even though it's a sheet of cells, different areas in the sheep connect to other areas? And if you follow that map you get a hierarchy and humans have a very deep and big hierarchy. Other mammals have a smaller one if you dive down next level, if you look at a slice through that two and a half, millimeters and you'll find the next structure. You'll see is layers, there's layers of cells.

A

uh How many layers depends on who's, counting but they're. Basically, four layers of cells- two three is one believe it or not, and then there's four five and six four layers of cells. If you dive down further still you'll see that there's neurons in there and the neurons have an organizational property in these something called mini columns, they're organizing these very many miniature columns. Many columns exist, there's a debate within the neuroscience world, whether they're functionally relevant.

A

We believe they are and I'm going to talk about them.

A

If you delve down further still and you look at the neuron here's a picture of the two types of neurons that exist in your cortex today, the two primary excitatory neurons, there's a pyramidal neuron on the left and a spiny stellar on the right, they're, actually very similar, except one, doesn't have the big branch going up the top.

A

Now these neurons have thousands of synapses on them anywhere between three and ten thousand synthetics on each synapses connections on each cell and what's interesting about it, is that only a small percentage of about ten percent are close to the cell body. This is what most people think about when they think about a neuron, most artificial, neural networks.

A

They think about these synapses that get summed in the cell body, but 90 of these synapses are far away and and for many years people couldn't understand what they're there for, because, if you activate one of those distal synapses, it seems to have no effect at all. So people say what is these thousands of synapses doing out here? We now know: what's going on, you can jump down further and we now know this is something maybe in the last 15 years or so.

A

It's become very clear that if you look at one of these little branches on these dendrites off near not near the cell body, far away from the cell body, they're active processing elements and if you have a set of synapses that become active relatively close period in time and relatively close in space to meaning they're near each other, it can generate what they call a dendritic action potential which travels to the cell body and depolarizes. The cell has a large effect on the cell. It doesn't make the cell fire, but it depolarizes it.

A

So now we have all these thousands of synapses out there that are doing some sort of, like coincidence, detector and then finally, people think that learning in the brain we used to think and many people still do it's all about changing the synaptic weight. That is the strength of these connections. Well, that happens to some extent. We now know that synapses are formed. New synapses are formed all the time and are being lost all the time, and this is a much more powerful type of learning. It's called synaptogenesis, and so it's not like.

A

Oh I'm, just incrementing a little weight here. I can form a completely new connection and this is really real, actually, where most learning occurs in the brain. Okay. So that's some cortical facts now, what's the theory behind this? Well, we have a overall theory for this. We call htm hierarchical temporal memory. It's it's pretty straightforward. Pretty simple essentially says we have a hierarchy of identical regions, uh meaning they're all doing something very similar. So that's pretty much fact. um They are learning something. So it's memory, and now but here's the thing.

A

We believe that all these regions are primarily memory of of time-based patterns, it's memory of sequences or temporal transitions and what it's like. It's like, you're learning melodies in each one of these, and what happens is if you can make proper predictions if it's a predictive model, predictive memory, if you can predict, what's going to happen and you form a stable representation for it and you end up with representations being more stable as you send the hierarchy, it's like you're learning, names of sequences and the names of the sequence of the sequences and so on.

A

And similarly, when you have a high level, stable representation, it can unfold into very long complex speeches sequences like my speech right, so I have some very high level concepts, I'm thinking about and then I'm just playing back memories that I've recorded earlier. I I've said these words before, um and I've said these ideas before and I'm just playing back recorded sequences, and it turns out to be a very fast level pattern. Okay, that's the basic idea of htm. The question now we want to ask is well how does this exactly work in detail?

A

What do the regions do? What are the cellular layers are doing? How do the neurons work etc? And this is what we spend most of our time studying, so, let's just jump into it. Further. Here's a slice of that two and a half millimeters of cortex, and we can see the four layers there, there's roughly two feed forward layers and two feedback layers.

A

So the two layers, two three and four are feet forward, and the uh five and six are feedback, and what we believe is each layer of cells is implementing a type of sequence memory. Each one is implementing actually a variation on the same type of sequence, memory. The two feed forward layers are doing inference or pattern recognition, and I'm going to go in details about how those we believe those work and the other ones are more of a feedback layer. Five is the layer, the layer that has cells that generate motor behavior.

A

So my speech is being generated by cells in layer five in parts of my cortex and then layer six. Is it has to do with attention and hierarchy? So those are the basic idea there and we think again. Every one of these layers is doing something similar, but it's doing it's a variation and it's being applied to different problems by what it's connected to. So you can take this sort of generic sequence memory and then turn it to different uses, because you know motor behavior is sequence, memory and inferences sequence, memory and so on.

A

So let's just jump into uh layers uh four and two three, the two feed forward inference layers, and these are the ones we understand the best, um a very classic neuroscience. If you read any papers in neuroscience, you'll see this often that the input to a particular region first arrives at layer four. This is the the basic idea and then it projects to layer, three and then layer, three can projects down to the next layer up the hierarchy. So this is this. Is the basic feed forward pathway?

A

It's not it's more complicated than this, but we can just leave it at this. This is what you'll see in most textbooks, and so you go with layer four to layer. Three then up to the next region, layer, four to layer. Three.

A

Now everyone thinks about the input coming into the brain as being sensory data like at the primary visual cortex or primary arthritic cortex, it's information coming from the eyes or the ears, but there's another thing: that's coming into er, which people don't remember, or they don't know which is you get a copy of motor commands? So the cortex is not just sensing the world it actually gets a copy of. Whoever else is making the behavior in the body.

A

So my whatever you know the parts of my spinal cord or my brain stem or lower pieces in the cortex, the behavior is actually passed in, so the cortex gets a copy of not only what is being sensed, but what the recent behaviors were that came from your body.

A

What we think is going on in layer four is um is what we call uh sensory motor inference. We learned sensory motor sequences. This best example and I'll use. A fair amount in this talk is when you're. Looking at an image or you're looking at me and as your eyes are moving, you can constantly completely change the input to your to to your brain completely. It doesn't feel that way the world feels stable to you, but every time your eye is moving, the entire innervation is different and this is not a high order sequence.

A

This is not a sequence that repeats itself. You can't predict just by the order of what patterns come in. What's going to be next, however, if you do have the copy of the motor command, you can do that if you say well, here's what I'm seeing I'm about to move over here. I can predict, what's going to happen next and we believe that's what's going on in layer 4..

A

This is a predictive memory. If the system can predict correctly what's going to occur next, we want to form a stable representation in the next layer, in this case layer, 3, and if, if it can't it says, look I'm not able to model this change. It passes those changes through it says. Look I can't handle this. The next guy is going to get these changes. What layer three does is a it's a it's a more of a pure auto-associative sequence memory.

A

We call it a high order, sequence memory um and, uh and a good example of what high order means is. I just give you two a very simple example here. Imagine I have two sequences, a b c and d and x, b, c and y and notice that if I want to show you after I train on those sequences, if I show you abc, you should predict d and if I show you x bc, you should predict y now I can't just use the previous state. I can't just say well c what should I predict?

A

I can't tell you that I have to go back in time. This makes it a high order, sequence and most of the world is like that. Language is like that when you walk around a building is like this most of the world manifests itself as high order sequences. So these are the two basic types of patterns you can. Even you can see in the world. um These are all these are like universal instant steps.

A

If you think about it deeply, there isn't much else the brain can work on, it can say: look I can try to make a predictive model based on my own behavior or I can try to make a predictive model based on some sort of high order sequences I can observe. If I can't do that, then I can't do it it's it's. Then it's like random. These are universally applied to every sensory modality, they're, nothing specific about vision, hearing or touch uh it's a very. These are very deep concepts.

A

It also, if you know anything about neuroscience. I won't give you the evidence for this, but these steps completely explain the type of receptor field properties we see in layer, four and layer, three and layer, two. um These concepts uh lead to that. So we're pretty confident this is what's going on now we wanna jump in further. I wanna jump down to like exactly what's going on one of these layers and then, when we get to the bottom of all that then we'll come back up again.

A

So on the left. Is your biological neuron? It's a classic picture from kahal back a hundred years ago and on the right is our model, the htm model neuron, and we have chosen to model certain parts of real neurons that we think are theoretically important.

A

So, let's just talk about the the biological neuron, as I mentioned earlier, there is about 10 percent of the synapses are close to the cell body. These these receive feed forward input. This is where the input that's like the sensory intervals or the feed forward patterns come to. It adds linearly in the cell body approximately, and this is what generates the spikes in the cells, the other two regions, what we call the basal dendrites in the bottom and the apical dense rights on the top.

A

They connect the ones at the bottom, these distal synapses these connect to local connections, nearby, other cells nearby in the same layer in the same region, and then, when to the top, we see feedback in the hierarchy.

A

As I mentioned earlier, these are non-linear. They are they. They generate dendritic action potentials and they depolarize the cell. They don't make the cell fire, they just put the cell in a state, that's ready to fire, and we call we call that a predictive state. We model this basic arrangement. We have a set of synapses on our model, neurons that are feed forward, the linear summation, we activate the cell, and then we model the distal synapses as a set of coincidence detectors.

A

Essentially they say if I see 10, 15 or 20 synapses active at the same time, within on a dendritic segment, we will generate, may put the cell in a predictive state uh going one level further. We have to talk about learning um biological synapses. These are the connections. You can see a little section in the dens right here. You can actually see the synapses on there.

A

We now, as I mentioned earlier, think that learning is mostly about forming new synapses and synapses themselves are very unreliable things they're very low, fidel fidelity. They don't always work. You know you know they kind of have the time to work half the time they don't, and so anyone who has a model of a neuron or model of the cortex that relies on high precision, even one or two digits of precision, is not a biologically accurate model, because their synapses aren't very good. The way we model this is different than most people do.

A

Is uh we model the growth of a synapse? It's like it's an idea called um uh potential synapses. You have an axon of dendrite that are near each other, but they don't make a connection and over training you actually grow the spine. The connection between these two this is well documented and um and it's a growth we model.

A

So we give that a scalar value, a zero to one value and when we train we sort of increase that scale and it's like growing the synapse at one point at some point it makes a connection to the between the two and there's a threshold for that, so that when they, when the permanence hit some threshold in this case point four, we say the synapses exist and before that it didn't exist, but we give it. We give it a weight of one or zero. There's no scalar, it's just a binary weight.

A

Then, what's the point of having this, if you keep training it, what's the point of having the permanence go up higher, it makes it harder to forget in a brain what happens if you keep repeating and training a real synapse? It gets thick and it develops this bhutan and it's just much harder to forget it. So we're trying to we model that that, as opposed to the weight of it, okay now one more detail and then we can start showing how this whole thing works. Together, I'm going to talk about sparse, distributed representations.

A

Subatai is going to talk in much more detail about him later he's going to give you some of the mathematical foundations, I'm just going to give you some of the concepts right now. I call this the language of intelligence. uh This everything in the brain is all about sparse, distributed representations. You can't understand how any of this stuff works, so I have to define what that is, and so we're going to do that.

A

um Pretty simply, the simplest way to understand is compared to the kind of representations we use in computers, which is called the dense representation in a dense representation. You might have a byte or a word of some number bits and you use all combinations of ones and zeros and um an example. Being ascii coding is a perfect example, and these bits have no inherent meaning. In fact, if I change one of the bits in ascii code, I get something completely different.

A

It's the whole thing has to be looked at at once in a kind of arbitrary assignment, say you know it doesn't really matter as long as we all use the same convention in the brain and in htm theory we use sparse, distributed representations this you have to have at least several thousand bits to do something uh useful. Now, when I say a bit, you can think about it as a neuron.

A

It's a neurons either active or it's not and uh they're sparse, because at any point in the brain, a time in the brain, you'll find very few of the synapses. Being very few of the neurons being active. Most of the time is only one or two percent that are active and most and most of the time um the rest are relatively inactive. So when I talk about zeros and ones, you can think neurons, okay, active inactive, so you have many thousands of bits and they're sparse, because we have a very small percentage of number one.

A

The example I'll use here is two thousand bits uh two percent active, so I'd have forty ones and nineteen hundred sixty zeros. Now it's important to understand that the bits mean something this meaning is learned, but we can think about it. They have semantic meaning.

A

So if I was going to represent a letter, I might have a bit that represents um um you know, is it a a consonant or a a vowel, or how is it sound or how is it drawn, does have a sender's descenders things like this attributes of it and I'd pick the top 40 attributes that match this letter. We don't do that.

A

It has to be learned, but that's the basic idea. Okay. So what are the properties of this? First? Very simple property is similarity if I have two sparsely stupid representations and uh they both have two percent of their bits active if there's, if they share a bit meaning if they share a cell. If it was in the brain being active, then they're sharing some semantic meaning, and this does not happen by chance. It's very very because these are sparse.

A

You will not do this by chance of even random sdrs hold very little overlap, so if they overlap even just a few bits bits, it means it's semantically, similar.

A

um The next thing you want to do we're building a memory system. The brain is a memory system, so the first thing you have to think about is like: how do I store a pattern and how do I remember it and how do I recognize it when it occurs again? So imagine I got these thousands of cells or thousands of bits. The way we want to store a pattern is not remembering the whole pattern. We only have to remember the locations of the one bits.

A

If I can just remember where the one bits are, and I can see a new pattern come in and say well, does it match those? If it does I'm good, I don't have to look at all the other ones. So in this case I might have index of 41 bits. But what, if I couldn't do that, what if I could only store the location of 10., there's some subset and I said you're not allowed the story of the location of all the one bits all the cells are active, just a small subsample of them.

A

Well, what's going to happen, a new pattern can come in and I might match those 10, but the others. I don't know about you, so I can make an error statistically, it's extremely unlikely for this to occur and if you do make an error you're making an error for something. That's semantically quite similar to the thing you stored- and this is the key to how the brain generalizes it's it. It does this now. If you haven't made the connection yet I'll make it for you.

A

When a cell wants to recognize a large pattern, it only has to form a small number of synapses to other cells nearby that are active, so it might be hundreds or thousands of cells nearby that are active. But as long as it's sparse activation, an individual cell only has to make connection to maybe 10 or 15 of those to know that the entire pattern is there, and this is what's going on on dendrites on neurons.

A

There's another property- and this is the last one I get into, but is also very very important to the theory- is that you can form a union of sparse distributed representations. So I let's say I took 10 of them and I just ordered them together and I now have a new one, which is the same number of bits, but it's got more one bits. It's got about, 20 a little bit less. Maybe I can't undo this. I can't say: oh, what was the original 10.? I can't do that, but I can ask.

A

Is this new pattern, one of the original 10 and I'm going to claim that if I say well, if the new pattern ones are in the same location as the unions, ones, I'm gonna have a match and you could think well. I could make a mistake there. I could be mixing matching from different of these patterns that I've stored earlier the union again extremely unlikely. For that to happen. The math shows it's almost astronomically unlikely to happen.

A

But again, if you do you're making a mistake for one of the things that are semantically similar to ones you had before this property is used throughout the cortex and in super tight talk. He's going to talk about that. Where is this I'll? Give you one example where it's used? Imagine now I'm looking at a neuron and I'm looking at all the synapses that are near the neuron there's several hundred of them and um let's say I store up 10 synapses for every pattern.

A

I want to recognize and I just throw those synapses all together, so I now have 700 synapses representing dozens and dozens of patterns. Well, that cell will be able to uniquely identify any one of those patterns without getting confused, and so we actually believe that we have a sort of different model of a neuron than most people think about it. We think that the neuron in our models, the the feed forward connections, the proximal synapses, can can activate the cell from dozens of feed-forward patterns.

A

The cell can actually respond to many different patterns in a feed-forward case. It's not just one thing: it's recognizing and it can respond to hundreds on the on the more distal dendrites that is, it can recognize it can predict its own activity in hundreds of contexts that are unique and very precise all right. So let's put this all together um in in how we get this to work. How am I going to build a layer of cells that learns, builds a predictive model and we're going to use that and to build?

A

Our cortex here is a bunch of neurons in our one of our simulations.

A

This is just a single layer cells and each of these cells is receiving some input on its proximal synapses, and some of them get more better input than others, and so that's the color designates, which ones are more strongly activated than which ones aren't and then the ones that are most strongly active, active fire first and they inhibit their neighbors, and so you end up with a sparse representation.

A

This is a picture of uh a very few number of cells, but it's about two percent sparsity, so we're just showing you a little subset and- um and so this is what you might have and a moment later, you would have. This is maybe one time a moment later. You have a different pattern, and so this kind of thing goes back and forth all the time.

A

This is what's going on in your brain right now, it's as I'm speaking you're having patterns like this sparse patterns changing, and that is the sequence we want to learn this these transitions. We want to learn how we're going to do that.

A

Well, when a cell becomes active, all it needs to do is to look for cells nearby that were active a moment ago and and it forms connections to them on one of its dendrites and now it'll say: look, there's a huge pattern going on out there, but I've just connected connected to 10 of those cells, and if I see those 10 cells going again, I'm going to predict it I'm going to become active next, because that's what I just learned, I became active when those cells are active.

A

So all the cells are doing this all the time. What will happen is if you train it for a little bit and then you show it a new pattern. In this case, the red cells of the input pattern, a whole bunch of cells will be saying.

A

Oh I'm going to be next, I'm going to be an x2, I'm predicting and the reason you have here we have more cells predicting than we uh than the more yellow cells than red cells is that we we train this on three transitions: a to b, a to c and a to d. So if I show it a it predicts, b, c and d, the union of these three and that's what the yellow cells represent the union of three patterns. Now this is the beginning of sequence memory. It says: oh, I can give an input.

A

I can predict what's going to happen next, I can make a union of prediction, so I don't have to have a precise prediction, but I'll know if any one of those three occurred and I'll know precisely if they occurred now. This is what we call a first order: sequence memory. It is not able to solve the problem. I proposed earlier of a b c d versus x, b c y, and we need to solve that problem. We need to be able to say you know what what I put in x is depend on.

A

Something happened a long time ago, not just something happened a second ago or a half a second ago. So the way we're going to solve this is we're going to use these mini columns and I'm going to walk you through that. This is about as deep as the comp that the top talk goes and we'll come back and get easier again.

A

So now we're looking at a slice of cortex, I'm showing just a series of little mini columns there, and this is a cartoon drawing and there's six cells per each one of those mini columns, and in this case I'm showing three. When you have a feed forward input, what we believe is going on it actually activates the mini columns um each one of these, and so it's a sparse representation of many columns. So I've shown three being activated now.

A

If nothing else had occurred and there was no prediction, what will happen is all the cells in those columns will become active. This is a I'm unexpected input and an unexpected input. No prediction: um I'm going to activate all these cells, it's sort of, like I don't know what's going on. The alternate scenario is if some of those cells were in the predicted state shown here in yellow and the same columnar activation occurs. Those cells will fire first and inhibit everyone else, and I get a very sparse representation.

A

The same columns, but now it's a sparse set of cells. This is a very unique representation for this particular transition. If I I'll walk you through an example here for the abcd and xbcy, so here I'm going to show you in this sort of cartoon drawing here's our sequence, a b c d notice. I've shown three columns active in each of those representations.

A

This is before training, there's no expectation. So all the cells fire, here's, the here's, the sequence for x, b, c y notice x- is different than a different set of columns, but the b columns and the c columns are identical and of course, then I have a difference at the different for y and d. At the other end.

A

Now after training, what occurs oops there's a little missing dots there. I'm not sure why the a is still the same as before, and um you can ignore those missing dots there. I don't they weren't in my presentation. They were just somehow I don't know what happens here is that we now get something called b, prime, since it was predicted, it had learned this, and so it's the same columns, but now individual cells in those columns.

A

Similarly, I have something called the c prime representation same columns of c, but individual cells in c, so c prime, is basically says this is what occurs after b, prime and b prime occurs after a, and so when I get to d, I can predict d in the column space, but I have a unique representation for d.

A

This is d in the context of a b c, and I can do the same thing for x, b, c y and you'll end up with a different representation for b b, double prime and c double prime and y double prime, and this is basically how you learn: high order, sequences, how you learn, speeches and music and so on. The capacity of the system is amazing.

A

If I have just take an example, if I had 40 active columns, a very small number of columns and I have 10 cells per column, then there are 10 to the 40th ways to represent the same input in different contexts. There's 10 to the 40th ways to say this in this context is unique from this. In another context, and it's when you start thinking about it, you realize your life is full of this. This is what you're doing all the time. There's a beautiful mechanism works extremely well.

A

We put all this together and we can make this whole thing work, I'm not going through all the details. We end up what we call the htm temporal memory. It's a it's a sequence memory it's equivalent to like a cellular layer. um It converts it into a sparse activation in columns and it builds a temporal model and it has some really nice attributes one. It learns continuously. That is, there's no batch training here. Every new input, it's constantly adjusting its synaptic waves and it's constantly learning it can extend sequences and forget them, and so on.

A

It's very high capacity. Even a very small section of this can learn millions of transitions. It uses local learning rules which is important when we get to harder implementations, but there's no global supervisor going on here. It's naturally fault, tolerant, every single component of this is can fail and and every individual component can fail and nothing bad will happen. It has no sensitive parameters and it semantically generalizes. These are all really desirable features in our memory system.

A

Okay, and we believe this is a building block for both the cortex and for machine intelligence. So essentially, I can do a variation of this sequence memory in each of the layers, and if I can build a region of cortex, then I can take build put that in a hierarchy and I'm on my way to building a neocortical system.

A

Let me just switch now to the research roadmap to give a sense of what we've done and where we're going.

A

So here's our little map of the four layers again, you should recognize them by now. Hopefully right we have spent most of our time working on what we consider layer. Two three. It is the high order inference. It's a high order, sequence memory. um This was the right place for us to start in some sense, the simplest one, the one that we could characterize easiest, I'm gonna give you a little. I say, theory 98. What does that mean? This is a very subjective number. It's how I feel about it.

A

Okay, that's it's just intuition about how much we understand about what's going on here, uh but I think be useful for you to share that with you. So I think we have a pretty good handle on this there's a few tight ends. We may have to tweak and clean up and so on. This has been extensively tested over years. We put it into commercial products. We know this works really well in commercial settings.

A

This is not just some. You know ideas on paper. We have taken that well, so we'll go on to the next thing. The next we've been working on layer. uh Four uh here I'm pretty confident we have the basic idea. What's going on, you take that same sequence, memory and you feed in motor commands and it builds a predictive, sensory motor, predictive model um and there's a lot.

A

We don't know yet, but I we're over the hump on this one as far as I'm concerned, and it's currently in development we're working on this right now uh we haven't built anything commercial with it yet, but we're testing and working it through it slowly the motor sequences.

A

This is where things really get interesting, because you start adding behavior and robotics to the system. I think we understand about half of what's going on there. I think we have some really good foundation principles. We haven't started implementing any of it yet so, but we think about a lot because it all these things are interplaying all the time and then finally, on the layer, six, we haven't really spent much time at all. I have some key components that I know have to be in there.

A

I give it about a 10, maybe it's 20 understanding of it. So that's where we are in terms of how we think about our research. Now, since we did layer, 2 3, and we did this high order uh inference engine, we then turned it into a technology and we've used it, so you can think about what can I do with this? Well, the data it works with is streaming data. It has to be data that changes over time because there's no behavioral component in this.

A

It's relying on data coming in like music to come in our speech or data streams off of machines. Things like that streaming data. What you can do with it is. We can model that data. We can make predictions, we can detect anomalies and we can do classifications. We've shown that we can do very good jobs at all this on streaming data. The applications uh are varied, I'll, just mention a few in a moment, but basically you can do predictive maintenance, security, natural language processing, anything that has streaming data.

A

These are the way we've done. This is we've taken. We've built a simple system. We take some data stream. You have to run it through an encoder. The encoder turns that data into a sparse representation, so we have encoders for numbers and categories and dates and times working with a company called quarter clio. We have an encoder for words. We have encoded for gps, coordinates, we've done a bunch of these once you've got them into sdrs.

A

You can throw them into a high order: sequence, memory and outcomes, predictions and anomalies we've applied to a number of problems. uh We have a commercial product called grock which basically detects anomalies in servers and server-based applications on aws we've applied to the same basic idea to human behavior. So can we detect when a human starts acting unusual? Maybe a rogue trader who starts behaving differently? We can do that. We can detect when people start using their computers differently in a very significant way. We actually think we can we.

A

We know we can detect anomalies in stock uh volumes. uh We've done some cool work in geospatial anomalies. We can attack when things get off track or change the direction or the change at different speeds and so on, and then working with this company cortical io we've shown that you can do natural language processing and it's very cool, and you should see the demos for quark io later today and I think chetan's going to talk about it too. I'm not going to go further into these at the moment.

A

I just want to say that we've that's what we've done with layer 3. and by the way, all of these use the exact same code base. This is not. We don't even have to tweak it. It's the exact same code base and we're getting close to that universal algorithm. um So you can as long as you get the data into a sparse representation, you're good okay. So that's what we've done mostly now. What could I do with the sensory motor inference the stuff we're working on right now? Well, what would it be good for?

A

Well, first of all, it's it's good for working with static data, and um but it needs some sort of simple behavior. So, for example, we could look at a picture which is static data, but to train the system. I have to move the eyes over it and I can do it in a fairly stupid way and still get good results.

A

So um it's sort of static data, but you need with some sort of simple behaviors. uh You can do classification and you can do prediction in this case. uh The example we're going to work on the one we're working on is vision. It's it's a classic example. Everyone wants to work on that.

A

How do you, as you understand what an image is and our approach is? Is you have to train the system it has to saccade and has to learn in a hierarchy how to make a predictive model, a motor of motor behaviors? And, what's going to happen, we think we understand how that's all going to work.

A

There are other things you can do with a very cool. You could do, for example, classify networks, any structure, that's out there. I could have a set of some sort of very complex computer network, and I want to classify it. It could be n-dimensionally complex, but I want to classify and say well what's it like, and what is it similar to? I think this technology would work for that. So some, I think some very clever applications are going to come out of this.

A

Finally, of course, if you go to adding and motor behavior, then things really get interesting. It doesn't matter you can have static or streaming data, but the capability you're going to get now is not just simple behavior but goal oriented behavior where the cortex itself starts saying this is the behavior.

A

I want to achieve a particular result, a particular predicted result, and how do I get there, and this is when we're going to be able to enter into robotics now, when I think of robotics, it could be physical robots, but mostly, I think, about things that are not physical robots. I think about things that are like smart bots things that are scouring the web, trying to figure out how, where bad guys are things like that or proactive defense anywhere.

A

You have some sort of system where the the intelligent machine is navigating intelligently through some sort of structure, and it's trying to achieve certain results. That's going to really where the whole thing opens up and I'll mention briefly the layer 6 thing it's not in the same category. Essentially, this is necessary for hierarchy and it's going to be really necessary for building very large multi-sensory, multiple multi-behavioral modalities, but the other threes are really the key things here. So that's our research roadmap. It gives you a sense of where we're going. You might be saying.

A

Well, how long is this going to take? I don't know um we did. We did it took us a long time to really figure out. What's going on the the top part there, the layer three, but now it's a lot quicker. We were able to do the layer, four stuff and figure that out much faster, and I think it's accelerating. So um I don't know how long this is all going to take to play out, but I certainly hope to be part of it and I'm working at it. Okay.

A

um Another part of our research roadmap is essentially our approach to doing research uh and we're very open and transparent, and um so, as you probably know, all of our algorithms are documented. uh People created multiple independent implementations of them. Our software is open source under gpl version. 3 license. We have a new pic open source project. You can find it at nomento.org tomorrow and sunday we're having a hackathon down in san jose uh there's active discussion groups for theory and implementation.

A

uh We are now started posting our research code. I mean really messy stuff like here's, what we played with today or up there, because some people wanted to see it and complete some people in this audience, so we're being very, very transparent, we're also open to lots of collaborations.

A

We have a long going collaboration with ibm research in san jose. This is the group we probably haven't heard about, but they're interested in doing hardware, implementations of these algorithms there's a similar relationship with darpa in washington dc. There's a program called the cortical processor, which is also based around htm principles, and we work with other small companies. I mentioned cortical io they're, doing the natural language, processing and you're welcome to contact me and momenta, and I'm we're all very open and available to talk about this stuff.

A

Okay, I want to give you my little flavor, and this is my last slide um of uh sort of the big road map here.

A

I know where the big picture what's going on, so we got there's a lot of confusion in this space about all these different approaches, and I may not be able to clear up that confusion, but at least I'll share how how I think about this, the I think, there's sort of three basic approaches to building intelligent machines on the left is the people like ourselves who say you need the model of cortex. These are cortical modelers. He said all right. We got a brain, it's smart, let's figure out what it does. Let's model it.

A

I use htm as an example, because I think it's one of the best and most advanced theories in this space on the then there's the sort of the artificial neural network world. uh The current favorite right there today is deep learning and uh that's getting a lot of press a lot of success, and then you have the more classic ai and lots of different things in on the all these categories, but I'll use watson, because that's it's been in the news recently and ibm's, pushing it very heavily.

A

Okay, so they're basically built on different premises. One premise on the cortical modeling is that biology matters and we focus on biology. We use the biology as a set of constraints. We don't think it's a nice guideline. It's it bet. This is the real mccoy. I need to understand how the biology works, and once I understand it, I can deviate from it. But I don't just we don't make up stuff willy-nilly and say: well, I think it might work like this.

A

Just try that now we constantly go back to the neuroscience and constantly go back to biology, say this: no, it can't work that way. It has to be something like this. That's what drives us. The artificial neural networks are really mathematically, driven they're, not biologically driven at all, despite the fact that they're called neural networks and people say they're brain like they're, not the neurons they use are totally unlike real neurons. The networks they use are unlike real networks.

A

The training programs they use are unlike biological training paradigms, but what they do is they have a mathematical foundation. They can prove that these algorithms, these networks, will converge or they'll produce the right result and that's a very powerful thing. Sometimes people I've been told over and over again from some of the people in this camp like well. You can't understand if htms work, because you don't have a mathematical foundation for it and you just won't know if it works.

A

I said well, I can see why it's going to work and I build it and it does work. They say. That's not good enough. Well, I don't know, but you know I say we build computers and there's no formula to represent how a computer works, and we seem to be happy with those so and then, of course, in the ai world. It's basically an engineered solution, we say: okay, let's we have a problem, let's engineer a solution for that, the data they work on. It's a little bit different.

A

We work with spatial temple data right from the get-go. We knew we knew that brains are all about temporal data, spatial temple data and we're starting to add simple behaviors into it. So I've given us credit for that one. The artificial neural networks are primarily spatial problems. um Deep learning networks are primarily spatial classifiers and they they realize they have to add the temple component and a lot of the researchers they are talking about and working on.

A

They haven't quite gotten it right yet so I'm going to give them partial credit for that, because they're moving in that direction, which is good and in the case of watson in the area world, the data they work with are language and documents. It's something very high level, totally very different.

A

The capability is a little bit different too. The cortical models are basically predictive models and we can do classification and we're starting to think about how to do goal. Oriented behavior today, artificial neural networks, like deep learning, are really just classification networks um and they're really good at it, but they're classification networks and then, of course, if they're watching this more like natural language querying now, all of these things are valuable. I'm not trying to put a value judgment on them. They all solve problems, they're all very useful.

A

um This is not saying one's better than the other they're just different approaches, but what I do think is is is is true, and my next point here is: are they on the path to true machine intelligence? Are they on the path to what we all think about of intelligent machines? And in that case I argue that the cortical modelers are the only ones who are on that path. We are definitely there. I've laid out a path here. I've talked about the components.

A

I've talked about how all this stuff fits together and a theory about how the cortex works and behavior we have a road map, even if we don't understand it all. We know where we're going uh the ai world no they're, not, and I don't make that up the guys who created watson said so themselves. They said this is not intelligent machine.

A

It's not going to lead to intel's machine, but it's really cool for what it does and good for them, and I'm going to argue in an artificial neural network world they're, probably not on a path to a machine intelligence if they want to, and if it's going to get there, they have to add time they have to have behavior. They have to add sort of these broader concepts. Sdrs things like that that I talked about- and my hope is that these worlds converge.

A

um But I do think that the way to get there and from my belief is to start with the cortical models. So that's my view of where we are and where we're going uh I'll just end with a few comments, um and then we can do some questions. um I I think we're at a pivotal time in humanity. This is a really really interesting time to live. We are at a time when we are actually figuring out how the brain works and there's nothing.

A

In my mind, that can be more interesting than that I mean we are a species is defined by our brain. That's really the only thing that makes us unique we're not really good at anything else and and and so and everything we do. That's interesting, our knowledge, our language, our arts, it's all products of brains and knowledge- is these only can be understood by brains and the scientific process is the process the brain uses I mean so to understand humanity.

A

We have to understand brains and the idea that we can now build machines that work on those same principles, to help us discover things faster and bigger and and and apply to problems that we're not very good at is tremendously exciting. To me, this is not about building robots or you know machines are going to take over the world. Nothing like that at all.

A

It's about building machines that are useful to us that learn and learn to build models of the world and interact with the world in ways that we can't do, and I have no idea what these future applications are going to be.

A

It's really impossible, but I know they're going to be amazingly cool and I think this is going to be driving technology development for the next hundred years and we we have a chance at least the opportunity to participate that uh now and times it seems very hard and difficult to do, but I think we're making really great progress so again, thank you for coming here. I appreciate your time.