Numenta Numenta Talks, 21 Nov 2014

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: What The Brain Can Tell Us

Description

Second Annual IBM Research Cognitive Computing Colloquium keynote by Jeff Hawking, co-founder of Numenta.

A

We're witnessing the birth of machine intelligence and it's a very messy and confusing time, there's lots of different approaches. Again we have people arguing for specific machine learning techniques to solve specific problems. Other people arguing for more universal systems. There are different types of learning: algorithms, there's mathematical ones, there's memory based ones, different types of training paradigms that are going on. It's a really messy world trust me if you're not living in it, but I believe the end at the end of this decade.

A

In fact, even before the end of this decade, we're going to settle out on a more one dominant paradigm. Now I'm my talk here is today's argue for one of these paradigms and I believe it's a paradigm, that's used by the new york cortex and the neocortex is a universal algorithm. As you'll hear me talk about it's memory based, it is a online learning system, it learns continuously and it's behavior-based learning, and this is going to be the dominant paradigm. I believe for the next 50 60 years of machine intelligence.

A

The reason we're going to have one for the same reasons: network effects people are going to want to build new hardware and software and systems on top of on top of the winning solutions. And again, why is this particular one going to win is because it's the most flexible, it's not always the best solution, but it's the most flexible solution and it can scale. How do we know this is going to happen. We have a proof case. The proof case is our own, our own brain, the own neocortex, and we know it's scalable.

A

We know it's flexible, it's scalable because we know, but nature has built neocortex, it's very small and very large and there's no reason we can't build them larger. So my talk today is really about what this is making the argument for this, and I think we're in this period of time right now and it's happening as we're sitting here. So um my company has two basic goals. The first goal is to discover the operating principles of the neocortex. Now, just to remind you, that's about 75 of the volume of your brain.

A

It's a big wrinkly thing. On top it's the location of all intelligence, language planning, high level, vision hearing and so on. Everything you think about is intelligent. Our second goal is to create in technologies based on neocortical principles. We are not trying to build a brain or anything like a human we're trying to build learning technologies that work on the same principles in the neocortex.

A

This is not about passing the turing test or building a human-like machine at all, and I think most of the future of machine intelligence will not be things that we think about human-like in the slightest here's the topics.

A

For my talk, I'm going to give you some startups and a little discussion about the cortex, some facts about it: we're going to do a little neuroscience theory here, then I'm going to talk about our research roadmap, where we are and understanding the system, then I'll talk about applications and I'm going to leave you with some thoughts on machine intelligence. So, let's just jump right into it. There's a picture of a human near cortex.

A

It is a memory system, it has to learn and when you're born it knows nothing and the way it learns is it interfaces to the world through a set of sensors. Those sensors change some physical quantity into patterns on neurons, and when those neurons are inside the brain, it's no longer light. It's no longer touch and sound. The neurons are identical, the ones that are carrying information about from the retina or the ones from the auditory system. It's just patterns and the way the cortex handles these patterns. When you get to the cortex, it's identical.

A

This is a pattern system. It's not a vision system, not an auditory system. It's a pattern system, it builds a model of the world from the changing data stream, and the data stream coming into the cortex is very rapid. My voice is bringing patterns coming to you in order of milliseconds and 10 to milliseconds changes.

A

It is a predictive model, so it's constantly making predictions of what's going to occur. Next, therefore, can constantly tell when things have changed and it's also generating behavior, so the cortex generates all high level behavior such as my speech.

A

Now, if you think about it, the patterns coming in from their senses, as you move you're, basically moving your senses to the world. So the vast majority of the changes on your sensory stream are coming from your own behavior. Your eyes are moving my body's moving. When I touch things these changes are coming in not because the world is changing, it's because I'm moving in the world. A lot of things do move in the world, but most of the changes coming in are from your own behaviors.

A

So the model that cortex builds is a sensory motor model of the world. You it's very difficult to separate those two things out: it's a sensory motor model of the world and we won't understand that happens. Okay. So, let's start with some cortical facts, there's a picture of a human knee cortex. Next to it is a picture of a rat neocortex, because it's all the same everything I'm going to tell you about today is neocortex. It's not specific to any particular animal.

A

It is a thin sheet of cells about two and a half millimeters thick in a human. It's about the size of a dinner napkin and in about this big about two and a half millimeters thick. This is you. This is me and um in a rat it's about the size of a small post-it stamp. Okay, it's a remarkably uniform system. I mean anywhere, you look in it. You'll see a detailed architecture, that's preserved across species and across modalities, incredible a detail. That's remarkably uniform from an anatomical point of view.

A

It's functionally uniform as well, even though the parts in the air court jokes. As vision and hearing it has been known for over 35 years, that the cortex processes, vision, hearing, touch and everything it does in the same way, the evidence that this is overwhelming. Most people have trouble believing it, but it's true, and- um and so that's it's a remarkably functional uniform. You can actually swap the auditory nerve and a visual nerve on a young animal and the auditory parts of the cortex become visual and the visual parts of the tortoise become auditory.

A

We know that the cortex is it's organized as a hierarchy. These regions connect together and if you look at the connectivity, it's a hierarchy. If you dig down- and you still slice through the cortex to that two and a half milliliter two and a half millimeters, the first thing you'll see- is an organization. You see layers of cells, there's roughly four layers of cells layers, two three, four five and six.

A

If you then dive in further you'll, see that the neurons are organized in these little mini columns, they're, very skinny, and about 110 120 cells, each vertically aligned across that surface there.

A

If you dig in further you look at the actual neurons, they have somewhere between three and ten thousand synapses each they connect and basically connect to a lot of other cells, not usually only one connection to another cell.

A

Now, ten percent, or so of those synapses are close to the cell body or proximal, and this is what most people think about when they think about a neuron. They say: oh, these inputs come in and they depolarize the cell and the cell can fire, but 90 of those synapses are further away from the cell body and for many years people had no idea what to think of them, because if you activate one of those distal synapses, it seems to have no effect on the cell body.

A

But we now know that the dendrites away from the cell body are active processing elements and if you have more than say, 10 to 20 synapses become active at the same time on a short distance away from each other, so close spatial tempo proximity that it generates. What's called the dendritic spike. It goes to the soma and it depolarizes the cell. It makes the cell doesn't make the cell fire, but it makes itself very close to firing and we believe this is a predictive recognition.

A

Further, most people think about learning in a neuron as strengthening synaptic weights. This is not really true. We now know that most of learning occurs through the formation of new synapses and synapses can form very rapidly on the order of minutes or seconds. New ones can appear and they can disappear, and this is a much more powerful type of learning than trying to increase the efficacy of a particular synapse. So this is this. Is this? Is the system that we want to understand? This is you. This is me. This is the. This is intelligence.

A

This is what it looks like, and can we understand in detail how this works? We have an overall theory for this. We call hierarchical temporal memory or htm. It's fairly simple. It starts with the premise that you have a hierarchy of identical memory: regions, that's a fact. The next thing we say: what's the primary memory which is going on in each of these regions is a time-based memory. It's a memory of sequences, it's like learning, melodies and what every region does is.

A

It builds a model of time-based patterns and if it can recognize those patterns, it passes the more stable representation to the next level, and you have a increasing temporal and spatial stability. As you go up the hierarchy which is observed in the brain. Similarly, you can take a high level, stable concept and unfold. These sequences going down and create a very fast changing pattern, such as my speech right now, which is what's going on in my head now the questions we want to ask: how does exactly this is. Do this? What does the region do?

A

What do the cellular layers do? What are the neurons? How do the neurons implement this so we're making great progress in understanding this? Let's jump in we'll keep going, I'm going to jump up a bit. So if you know bear with me if this is more than you want to hear, but we'll get some detail here and then we'll come back up to high level again. So, let's keep going down further here. The basic principle we think is going on here is that each of the layers is implementing a type of sequence memory.

A

In fact, they use the same neural substrate in the same basic process, but they apply it in different ways. It's sort of a variation on a theme. That's going on here now there's two layers that are basically feed forward layers, the top in the cortex, those layers, two three and four and there's two layers that are basically uh feedback layers, layer, five and six. Let's just walk through this. A bit input comes into the cortex typically into layer. Four. This is the feed forward input. Everyone thinks if this is like.

A

Oh the input from the eyes- or this is the sensory data coming in, which is true, but most people, don't remember or don't know, is that there's also the cortex gets a copy of your own, a copy of your own behaviors, your own motor commands. So when you move your eyes, which is done something by something called the superior colliculus, a copy of that command gets sent to the cortex, so the cortex knows what behavior was just generated.

A

This is a universal property and what we think is going on in layer four is that it's building a uh a sensory motor model, it's doing inference of sensory sentry sentry motor inference. If you will and give you an example of that, when you look at a face and your eyes saccade over the face and you're doing this three to five times a second you're, not aware of it, because the world seems stable, but it's happening.

A

The pattern in which you succud over that is not predictable.

A

You can it's not a fixed order, and so, if I wanted to say well now that I'm seeing an eye- and I want to predict what I'm going to see next- I can't just say, but I have to know where you're going to move well, if I see an eye- and I go down to the right- I'm going to see a nose so what's happening here in this layer, it's building a predictive model of what's going to happen when you behave given what you're, seeing or feeling or hearing now.

A

This next goes on to layer, three and layer. Three is a high order, inference model. This is something where you're just looking at the the sequence itself, like a melody or like my speech and so make a prediction what's going to occur. Next, all you need to know is the sequence of things coming along. So if I know the what the history over the last some number of notes in the melody, I can make an accurate prediction: what's going to occur next, then layer, three projects, the next hold on hierarchy.

A

That's your basic feed forward pathway, layer! Five is where you have cells generating motor behaviors. So my speech right now is being generated by cells in layer five in parts of my cortex and uh these project subcortically to other motor areas. So the cortex basically controls other motor areas. It doesn't actually innervate muscles itself. It drives other things that move you and then finally, layer six is primarily attention. It's a feedback layer now. The points I want to make on this slide is that each layer is doing a variation of the common sequence memory algorithm.

A

We understand the basic model of each of one layer will understand the basic model of all the layers and these are universal functions. I want you to understand the premise here: what biology tells us this? Is it you do this in a hierarchy? You've got everything the cortex does and in what you can see here, there are no pure sensory areas of the cortex, that's a misnomer.

A

There are no pure motor areas of the cortex, that's a misstatement, it's all sensory motor and this same process is being used in every modality and in a hierarchy, and if we can understand this, we are a long way to building brains. So the question is: how does this sequence memory work? Could we really understand this? Can we understand what these layers of cells are doing and the answer is yes, we actually think we do. We think we've made a huge progress in this.

A

uh We call this a basically a pretty boring name, hgm temporal memory uh for those of you who might have been following um dementor for a long time with the previous we used to use cla but htm temple memory. This is a picture of one of our simulations, uh those little cubes or neurons. You can see they're in a layer with vertical there's a many columns there. The colored cubes are either the red ones are active, the yellow ones are in a predictive state.

A

Now I don't have time today to tell you exactly how this works, but you can go learn about it. You can see I'll give you how to tell you how to do that a moment. I just want to tell you the attributes of this. What this system does what this layer cells does is essentially learn sequences, it recognizes recall sequences and it predicts next inputs. It does all three of these simultaneously. These are not separate steps. It's constantly learning online learning, constantly inferring over everything.

A

It's learned and constantly making multiple predictions at the same time, not a single prediction, but a union of predictions. It has some really nice attributes and I can tell you because we've been building this for four years. We've figured this out four years ago, so we have a lot of experience with it. It's very high capacity. It can learn. Even a small simulation can learn millions of transitions.

A

It is a distributed system and has local learning rules. This makes it uh naturally fault. Tolerant. You can lose neurons, you can do columns, you can lose synapses, you can do a huge amount of noise in the system and it still behaves very well just like brains. Do there are no sensitive parameters to this. It's not hard to get it to work if you once you've built it correctly and it actually generalizes they're able to apply the same learning to new situations that are semantically similar to previous situations.

A

um I just I want to leave you an impression about this. This is not just another neural network, there's some things that are really unique about it. I'll just give you three of them. First of all, it incorporates a fair amount of detailed cortical anatomy. Now we didn't do this because we could do it. We did this because we had a theoretical need to do it. So we've only added features to this model that we know exists in the brain, but also we need to get this thing to work.

A

The way we think it has to work, so we have a model for what many columns are doing to create high order representations. We model certain inhibitory cells, certain connectivity patterns, etc. No one else does this in an information theoretic model like this, the whole thing is built on sparse distributed representations. What I'm emailing. That is that, at any point, in the time of the brain, only about one percent, two percent or half percent of the cells are active. Most of them are inactive.

A

This is the key to intelligence. We have figured out so the mathematical properties here, they're very unusual. There's a talk online. You can see. We just posted the yesterday, plus some papers describing these properties. This is the key to intelligence. If you understand, you know how we're going to build tesla machines, you have to understand the representations and sparse distributions. Have these amazing properties that are surprising and the whole foundation thing is built on it? I can't go into detail about that today and finally, the neurons we model um are have active dendrites.

A

We we model learning by synaptic growth. Again we had to do this. This is how we get the online learning to work. This is how we make a highly predictive system. This is, unlike any other artificial neural network. You've ever heard of I'm not aware of any other system that incorporates this kind of level of detail. It may exist, but I don't know about it. Okay, if you're mentioning this, this is completely documented the source code. There's people built this multiple times.

A

You can get a lot of information at nomento.com, learn and there's some new material up there just posted yesterday, you should check out okay. Let me talk about our research roadmap, so here's the system we're trying to understand we're, trying to understand these, these layers of cells in a region of cortex, and once we can figure that out, we can go build brains.

A

um Where are we? We started with layer 3 because it's actually the simplest one anatomically is it's the it's the cleanest one to look at and that's the high order, sequence memory and I would say we're on a theory point of view. This is purely subjective from a theory point of view. I feel like we really understand this. Very very well, so I put 98 says about close as you can get to anything, there's, maybe a few things we might get wrong here and there it's been extensively tested. We put it in commercial products.

A

We know this thing inside and out uh on layer. Four, we figured out about a year ago, what's going on here and how this builds the sensory motor model of the world. It's just a variation. What's going on in layer, three, where we're using motor commands, I would say the theory is about 80 there we're implementing this. It's working. We have a lot more to do, but we're really really far over the hump on this one. It's in development.

A

This is what we're working on now on the layer 5, which is where you start generating behavior. um We have the big building blocks of the theory. We understand the basic opponents about how the neurons are doing this, how they're interacting with the rest of the rest of the body and the brain, but we haven't, started implementing it all and there's a couple of several other big building blocks were missing, so I put it about 50, but I feel really good.

A

We're gonna get this one and um and then finally, layer six is more complicated um and um I'm we're it's a little bit more nebulous. What's going on down there, okay, so that's where we've been doing and since we started with layer 3, we started with this high order. For instance, we got it working. The theory hangs together really well, we said: let's try it on real data, let's see applying it. So what we did is we said: okay, what could we do with that? Well it.

A

This is a system that this part of the cortex, this layer basically does high order, inference uh sequence inference, and it basically requires that the data be changing on its own. There's no behavioral component to this. So we said: let's it can work on streaming data. Anything that's changing over time. It should work on. We can do prediction, anomaly, detection and classification, so we said there's a lot of applications here: let's try them out and I'll show you what we've done now, how you build a streaming application for a data application using this technology.

A

Well, you take a data stream. You stick it through something called an encoder which just basically changes some number or quantity or something into a sparse distributed representation. That's the language that we need, and now I have the sparse distributed representation. I feed it to the hgm, and now I get a stream of predictions or anomalies or classifications. That's what I can do with this now. There's many many sources of streaming data. uh You know john mentioned earlier.

A

You know we're going to be washing all this data yeah most of it's streaming data, anything you can regularly get from applications and servers and medical data and industrial equipment, social media. All these things can generate millions and millions billions of data sources that are changing over time. So we have a potentially a way of modeling that now what kind of encoders?

A

I won't tell you how the encoders work it's kind of cool I'll talk about it later, but we built ones for numbers and categories and dates and times we have one for gps and even words I'll talk about this in a moment. So we have everything we need to do here, and so we went and built some applications here are six applications, I'm going to briefly talk about that are all about streaming. Data applications on the top three they're all about anomaly, detection, they're, similar and the bottom three are very different.

A

uh We started with server metrics, we said, let's see if we can model servers and detect when they are in anomalous states by looking at the temporal characteristics of their metrics and that's the one we developed first, we've turned into an actual product called grock. The way we do that is we take some server. We take a bunch of server metrics off of it things. You can. You know cpu utilization file, access things like that. We run them through encoders. We build a model for each metric.

A

We actually started by assuming we bring combine these metrics together into a single model. We found it works better to actually build lots of separate models and then combine it later. So now, what I'm doing? I'm basically modeling a temporal characteristics of various data streams we detect when those temporal characteristics change significantly and we say there's something unusual going on here.

A

We display the result here on a dashboard I'll show you a mobile dashboard here from our product rock and the way this basically works is you may be monitoring a thousand servers and or more whatever, and very few of them can be anomalous at any point in time very, very few, and so we basically sort them an order of anomaly so how unusual they are over some last period of time like an hour a day or a week and we'd show them in this little chart.

A

You see here and the higher the graph and the little bars and the color indicate a highly anomalous state. So you only have to look at the top few things like a little dashboard to look at, and this is continually updated. It's running on my phone right now and you can you know these. These bars move across over time as as the servers perform, we also have a web dashboard, but I'm just going to show the mobile ones so what kind of anomalies we didn't know?

A

How well is this going to work so what kind of anomalies can take? Now we didn't tell the system at all. What any of these numbers mean? What any of these metrics means that the servers or anything like that, we just said: here's a stream of data. We had no idea what it was going to find, so it turned out to be really really good, really good much better than I even imagined it could be.

A

So I'll just give you some simple examples here. You can see that on these pictures, what you're seeing is the top bar. There is a server anomaly. That's in the white area, then the middle thing is the actual metric data that was anomalous, that's the black with the blue lines and then underneath it you'll see the anomaly score for that particular metric. So you can just look at the when the when the anomaly occurred and and the sort of the the graph there.

A

So you can see some very simple things: sudden changes, slow changes, uh sudden subtle changes in regular data. That third picture, you might see, there's a slightly different blip on the right hand, side there where they are on the occurred, because this is a regular data stream. It says that is significant, a very highly significant event where, on the one on the right there, you see it's a very noisy data stream and any particular spike doesn't mean anything, but you can still capture these to these sort of changes. Now, here's here's where it got interesting.

A

This is when we started seeing things like this is when we got really excited. This is an example. We see a lot of these where a human can't tell what's going on a human would not be able to see an anomaly here.

A

This is a single server, two different metrics, two different models, both caught and nominate the same time, even something as simple as seep utilization, and I think this one is disripe bytes and um if you look at that data that blue graph, you can't see what's going on, you wouldn't pick that point in time as being highly anomalous. Well, I wouldn't, but this system is statistically you can prove it mathematically. It's going to be highly highly unusual, given recent history of this system like over the last few weeks what occurred here.

A

This is a build server where every time an engineer checks in code, it starts a build process and what happened on this particular day. An engineer started the build process manually at that point in time. That's it just started it manually as opposed to automatically you, and I can't see the difference there, but it catches it. It says I've never seen this I've caught it in two separate models, I'm certain of it. Something is unusual here now in this case it could be benign.

A

It could be a risk, something shouldn't be doing that or it could be something malicious. We don't know, but you don't get many anomalies and when it catches them, they're really important. So we saw this. We said this is really cool and we said what else could we apply this to? We said: let's can we apply it to human metrics, like you're sitting at a computer? Looking at your keyboard access and your file access?

A

Can we tell if someone's changed their behavior or someone else is using a computer and it turns out we can it works very nicely. We asked someone came to us and said what about financial data they asked us. Can you predict volumes of stock trading? We said we don't know they gave us the data. We looked at it and turns out. We did a really great job at it.

A

In a matter of an hour, we had results that equaled the best in the industry, and then we said we can turn this into anomalies, and so what we're doing here, we can actually monitor thousands of equity trades and find when there's subtle, anonymities in the volume of those trades. We're now trying to add social media data to it. uh We're in the process of doing this right now trying things like twitter and tumblr see if we can find anomalies in there too and combine those. So someone might say: hey.

A

Is there something unusual going on this company? This is not just for people who trade it could be. People who are uh who are they want to track their customers? What's going on unusual in the customer or procurement base, things like that anybody wants to find that. So we think this is cool, we're going to have a product this next year.

A

Now here's a something totally different. This is a researcher at berkeley who um a team there. They want to use an eeg, the scalp recordings to control things like prosthetic arms and and robots, and things like that, so they said. Can we take this data, run it through the htm and classify it and try to say like it's? Am I thinking going left? Am I thinking going right or up or down that kind of stuff? They just did this work two weeks ago and they got really great results.

A

I I won't claim success here because I think it's too early, but I do think this is the kind of problem that we should be able to do. A good job on here's. A company in uh in europe called peoneck. They are using this technology to to basically track ships through the harbors of europe and they said look can I can. I detect, learn the typical temple spatial patterns of ships moving through harbors and to detect if they start moving differently than normal, and it turns out it works very nicely.

A

We created a very cool encoder for gps, so you can feed in gps, coordinates uh into the into the hdm and it turns them into sdrs, and so, as a ship moves to the harbor, you can tell. Is it the kind? We don't tell it what it should be looking for, but it can detect. We found so far changes in velocity changes in direction uh being out of out of path. Whatever is typical, you don't have to tell it anything which it's supposed to know. You just say: here's a whole bunch of ships.

A

Let's learn for a while, and now you tell me when something is unusual, and one thing I didn't mention is continually learning. So if there's a new pattern becomes normal, it says okay, after a while, it says: okay, that's not anonymous anymore. Now. The last example here is about natural language, and this is done with a company called cortical io they're based in austria. They read our papers and said holy smokes.

A

This is cool, they do natural language processing and they felt like the sparse distributed representations were the key to understanding natural language processing, and I agree with them, because this is the language of the brain and it has all these nice properties of semantic representation. So they created an interesting tool. They take a corpus of documents like wikipedia and they built the tool they trained the system, and I can't I can't explain how it does this in this short time, but what you get out of it.

A

You can ask, give it a word or document and say give me a sparsely stupid representation. It's a picture of them here. There are like 16 000 bits you can see. Most of them are off the little dots of the one bits that are on and so on now one of the properties of sparse distributed representations are the bits mean something they have semantic meaning. You may not be able to say what it is, but they have semantic meaning.

A

So if I have two representations and they share bits in the same location or the same part of the array, then they're sharing a semantic meaning, and this doesn't happen by chance. It's it's meaningful, so you now have these words. These representations, which capture the semantic meanings of words all right. What can you do with this? Just start something you can just do some very simple things with the words themselves. You take the word, the representation for the word apple and you take the representation for the word fruit.

A

Now apple could mean a company, it could mean fruit, it could be music, it means a lot of things and what you're doing is you're subtracting the bits on that are on a fruit that are also on an apple and you're, removing the fruitness from apple, and you get a new sdr which is never seen before, and you can then look and say what is this closest to semantically and the answer you get is computer.

A

This is really cool. It works over all these different things. You can do this so then we said. Oh, these are other words that are nearest matches after that. So then we said: let's train this on. We train the htm on series, just sequence: it's a high order, tempo pattern of words, and what can we do? It's a very simple little system. That's doing this.

A

So we we created, we did a first test, which we said: okay, create a bunch of three word sentences and it's like an animal either eats or likes something and then what it eats like. So an elephant likes water and elephants eats grass or something like that. We train it. On 50 60 sentences like this now we then give it a new sentence. Now, what I say by new the system has never seen the word fox and we feed in fox eats, and the htm always makes predictions. So it's going to predict something now.

A

What am I never seeing? The word fox think about it. Fox has semantic meaning it can mean an animal, it can mean a new station. Maybe it means something else. The point is it has this meaning in it, and so those meanings overlap.

A

Other words that the system has already seen so, even though I've never seen fox before it's seen, things which have semantic overlap with fox, and so we can ask it like well, what do you think of fox seats and you get an output, you look it up, and the answer we got was rodent, never seen the fox before, and this is the first time.

A

I believe that anyone has, even in this small little example, has taken brains representations, how they work in brain neural mechanisms, and this is the the core of how language is processed in the brain. This is not that this is the the we're getting close to the real way. This is happening in brains.

A

This system is completely unsupervised, it does semantic generalization, it actually works across languages. You can mix and match languages. They did a very cool job at this. We think there are many many applications here, we're excited about it, they're excited about it, we're not talking about what those applications are yet, but we think we can do some really cool things that no one else has been able to do before.

A

Okay, I want to make one point here: every one of these six applications- and only one of these is a real application. The other are just sort of you know demonstrations, but the code is available for them. They all run on the exact same code. I mean not a recompilation, not a re-parameterization, not a tuning, the exact same code. We didn't tweak anything. We just said change the data type new encoder run it through the system. What do you get? I think that's a very powerful statement.

A

It's getting at the core fundamental flexibility of these algorithms, and you know if you asked a bunch of data scientists to do something like this. They wouldn't come up with one algorithm to do all these things, uh and this does really well on all these without any modification whatsoever.

A

Let's go back to our research roadmap. I talked about how we did layer three and then the applications we can build there, we're in the process of doing this layer, 4 the sensor motor inference. What kind of applications can we build there? Well imagine this again. This is like your eyes, so cotting over an image, so we can do static pattern recognition. So we can work with static data, but we have to have an active learning system. You have to move through the data instead of the data moving itself. The data is more static.

A

You move through the data and that's how the brain learns. So we can do classification and we can do prediction in this case. We are currently working in a vision paradigm because that's a very well understood problem, so we're working on uh image classification, but we're doing it the way the brain does it, um but there are many applications here.

A

I think what you want to think about is anything which has spatial structure you want to classify, so you can imagine some sort of network, whether it's people, networks or computer networks, and you want to classify it. You want to say: okay, I'm going to have the system look through the data, and it's going to come back to me and make classifications predict what it's going to see you could you could use this in analyzing corporate structures or financial structures or social network media structures and so on?

A

I think there's going to be some amazing applications that come out of this besides vision and we're pretty excited about that when you go on to layer five, which is where you now, you get goal oriented behavior, and this is where the whole thing can really be blown open.

A

I believe uh not only get robotics, of course, but you'll be able to do things which are virtual, like smart, bots or uh or proactive defense now, you're, not just moving through the data in a simple way, you're moving in a way where you're you're trying to achieve a goal or an end game, and I just I think we have the basics of this down understanding how it works, but there's we're we haven't started working on yet, but we need to finish the other stuff first and finally, the last layer.

A

Six is really about enabling larger hierarchies with the multi-central modalities. Okay, we're very transparent in our research. All these algorithms are documented. I wish they were better documented, but they're documented well enough that many people have independently created these in multiple languages around the world. So that proves that they're. Well enough documented, we we have an open source project called new pic, which is at nomenta.org.

A

There we've placed all of our own software, which is a gpl format. We also have a commercial license. We even post our daily research code, so you can look at all our messy stuff we're doing, and we have active discussion groups for theory and implementations. We have lots of collaborations. We have a small collaboration with the group in ibm, album and research. We've been looking at these algorithms. We have a collaboration with darpa who's, trying to get a program going for the cortical processor, which is based on htm and little companies like cortical io.

A

We're very open we're just trying to make this thing happen. That's what we're trying to do and anything that works. We're open for this is uh just a chart of our open source community. We started about 15 months ago. It's been growing very nicely. Continuous growth and more and more people are getting excited about this. More and more people are actually understanding it. There's some people out. There really deeply understand what we're doing, um and you know as well as we do um it's just scary, but that's happening.

A

Okay, I'm going to end my talk um with a with a story and it's a true story.

A

21 years ago I gave a talk at intel and they have an annual meeting where they bring in the top 200 managers in the company from around the world, plus the exec staff, to do business, planning and part of that meeting. They have an invited outside speaker and 21 years ago I was the invited outside speaker.

A

I talked about the future of personal computing and I said in the future: it's all going to be about mobile devices, pocket supplies, computers and that billions of people are going to own these, and it's going to be their primary access to a computer and information.

A

I made it clear that the desktop and the laptop were not going to go away, but that the growth and the innovation and the profits we're going to be in in mobile devices. I suggested that intel was a company perfectly positioned to capitalize on that new trend in mobile computing.

A

So um after my talk it was. It was a right at the lunch time. So I sat down with everyone in the room and I sat at a table with gordon moore and a bunch of other senior execs and the conversation got awkward very quickly.

A

um They did not believe a word, I said, and they said to me well, what are the applications going to be for these mobile computers? Why are a billion people going to buy these things? And I honestly said I don't know. I said here's what I do know. I know it's going to be primarily about information access, because that's what you can do on a small device.

A

I said I know some simple things like a calendar, an address book people are going to want those, and I said I know people are going to want to access information on a small device, and I know we will be able to build machines that will be capable of that. But I don't know what the applications are, but I tell you it's going to be great.

A

It didn't work. um Three years later, we introduced upon pilot, which was essentially a calendar, an address book, but it was a computer and a year after that we had thousands of applications on it.

A

Three years after the palm pilot, we introduced a trio which is one of the first smartphones which I designed and, uh and today, of course, 20 years later, I bet you every one of you has one of these computers in your pocket, and it is the driving force now here I am today, I feel deja vu, um I'm talking not about the future of personal computing, but the future of computing.

A

I've said we've had 60 years of one paradigm. I think we're about to start the next 60 or 100 years of a different paradigm and in the the future is about machines that learn and I'm very confident to say that those machines are going to be built on the principles of the near cortex, sparse, distributed representations are going to be essential that algorithms of the temple learning distributed. Temple learning algorithms are going to be part of this. I'm very confident in this.

A

These principles I talked about are going to be the foundations for machine intelligence once again, I'm speaking to a company that I think should be a leader in this field and maybe you're going to be, but you know, ibm, has all the right routes and all the right history and all the right capabilities to do this. One of those is by the way, if you have to build, really neat cool hardware which, unlike anything, that's ever been designed before, and how many companies can do that. You know ibm's one of those.

A

Once again, the biggest question I get is: what are the applications going to be for machine intelligence?

A

I'm going to tell you I don't know I can I showed you what we're doing now. I think some of those are pretty cool.

A

I showed you what we're going to do next, I could sort of lay out a roadmap, but those applications as cool as they are and maybe as big as they might be, are kind of like the calendar in the address book. You know we can't really know, and I I can't pretend, but I've shown a roadmap to get there um and I'm very confident this is going to happen. This is not going to happen in a long time. I am sure in two to four years we're going to this is going to be going.

A

This is going to be a big business and 20 years from now machine intelligence is going to be driving this industry in so many different ways, and I hope I'm here for that time. Thank you very much.

B

Well, I will tell you that this is very different than that. Luncheon uh we've made a decision we're the community's here and it's it's really about building this out. So uh jeff we've got time for some questions and, uh let's ask those in the audience or those are in the remote sites or in our remote labs.

A

B

A

The mic uh all right, do you want to call out people should I call why don't you look for friends.

B

I don't know your friends, I saw a.

A

Hand- I'm not I I I I need.

B

To let's start with any people who.

A

Are gonna correct me, so I don't want friends.

A

C

Really wonderful, fun talk. Thank you. Thank you.

C

So the one thing I I wonder about is your thought that synapses grow de novo, as opposed to being tunable, because my sense of neurobiology and cancer biology- and genomics generally, is that I don't think of it as a new versus old set of growths in a cognitive learning system, but rather in vivo, in the brain at least and in biology, I think of it as tunable much more subtle than you're uh suggesting so have you thought about that and have you thought about?

C

Whether connections could be made in your system to be tunable, as opposed to black.

A

And white, so let me let me approach that. Let's get everyone here that question! Yes, yes, okay! Let me approach it. Two different two different takes on it. First of all, synaptic growth and synaptic modification are really just on the same spectrum.

A

uh You know increasing the efficacy of a synapse uh is, is you know growing a new one is very, very similar. It's this I'm starting from zero as a you know and into one versus you know, it's not really that different than you think it just has a you, have a much bigger information capacity if you're able to start form new connections, but I want to make a point in real synapses.

A

uh Real synapses are very noisy.

A

They often don't work at all and actually potentializes the synapse, and it may not release any neurotransmitter it really it's just the very stochastic devices, and so anyone who requires an information, theoretic uh synapse that has even one digit of precision if it requires that it's not biologically possible.

A

So um I'm not saying you can't strengthen the tuning synapses, it's just that you can't really rely on it, but going from no synapses to a real synapse is a very strong event, and- um and so it's it's- it's not a I'm, not trying to like, say you. You know it's not possible, I'm not trying to say it's different. It's just a much more powerful way of learning and we know it's happening um and by the way, it's another thing, a real advantage of it.

A

It allows the system to um to basically experience a pattern a few times and start forming a connection before it actually has any effect. So you might, you might not want to affect behavior until you've experienced something a few times and so by the growth of a synapse. What we do is we. We increase something called the permanence where you have an accident, a dendrite near each other.

A

They have nothing between them, that's a zero permanence and, as you increase the perm permanence, which is a heavy and type learning you're, essentially growing that filipinio towards the towards the the dendrite toward the axon, and when you make the first connection you, then we we give the strength of the synapse a one. It's a weight of one, but it's not very permanent, can be very easily forgotten, but you've increased training. It becomes a longer lasting memory and it's harder to forget. So we've chosen that paradigm.

A

It's just a much more powerful information paradigm, and this you know when you learn something basic. The whole learning model here is not just tweaking something. It's like. I need to learn something new. I need this is a new pattern. This is a new idea. This is a new animal and you have to really lay these foundations down quickly. So maybe I'm just trying to say overall there's some biological evidence to what we're doing a lot of it actually and also. I don't think it's diametrically opposed to the principles that you adhere to. Anyway.

A

I think it's just a variation on a theme. There this question gentleman here I saw his hand next. Oh I'm sorry, I missed it. I'm sorry, I didn't see your hand you're, holding a mic.

D

I'm a neuroscientist, I'm mapping the cerebral, cortex and, of course, I'm very excited about the talk that you gave here.

D

I'm just wondering you so much emphasized the uniform principles of organization in the mouth and red brain as compared to the human brain and if you just have a look to the thickness of the cortex, this is, of course true. It's maybe by a factor of two. It is larger in the human brain as compared to the mouse brain. But when you look to the number of cells is about 2, 000, 2, 000 times larger.

D

When you look to the connectivity, it's about 50 000 times larger in the human brain as compared to the mouse brain, and this is not only a question of quantity. It's also a qualitative question because in the human brain there are more areas which are not present there. There are other gene expression patterns.

A

D

I'm wondering whether a system which is so much relying on the cortical architecture as humans would benefit from not so much emphasizing the uniform character of the cerebral cortex. But.

A

Or emphasizing the.

D

Differences between cortical areas, but because it is also the interplay between different cortical areas, which makes the human brain uh so.

A

D

And so effective, in contrast,.

A

Yes, yes, I'm very familiar with this line of thinking. um So look what I presented today. I try to make it look simple right. That's that's my goal today and as a scientist who's trying to understand fundamental principles, you have to try to get at the core principles now. Cortex is not nearly as simple as I pointed out here. There are other structures involved. We study the thalamus and the you know, there's tons of stuff, that's going on. They didn't talk about now in the cortical world.

A

It can be divided into two ways of thinking about it. One way is to say what are the common principles that are operating everywhere and what are the variations on it? There are variations, not all cortical regions are identical. There are variations in the theme going on here. The way I view it is that you want to understand those common principles first and then you can ask how do I deviate from that? Why does irat have a bowel cortex? Why did it?

A

Why does it form that for the whisking sense, um why do we see you know a stripe layer, iv and v1 in certain mammals, but not in other mammals? You know those kind of questions. Why do we see certain cell densities? Nothing I presented here, I believe, was incorrect.

A

There are further variations on a theme that evolution has discovered, and so it's perfectly good line of research to say what are the differences between these areas and most people focus on that they it's almost like saying. Well, this is a visionary. It must be different or there's a language area. It must be different. Let's try to find some magic cell over here.

A

That does which may exist, but the point is, I want to try to find those common principles, and once you find the common principles, then you can do variations on a theme on it from a theoretic. From a theorist point of view, I believe that's the way to go. The evidence for common principles is is unequivocal.

A

The evidence for variations on it is also unequivocal so, but I just we choose to find the common principles. First understand those in detail before we go and say well why this species, or this region slightly different, slightly different, we're not talking about radically different that, I think, would be a mischaracterization of it. So again I try to stick to themes which I can justify across all species and uh basically across all areas, and I didn't get into variations like oh yeah. Well, why is the striat before you know?

A

You know it's not the you know, that's a perfect example. You know in humans and certain mammals. We have layer, 4 subdivided in v1, and but there are other mammals that don't have a subdivided v4 and, and they see two, but they may not see as well as we do so. Let's not worry about that detail. Yet we'll come back to that later. That's that's my basic answer to that question.

E

Jeff the great work and thank you for sharing my questions along the same line, you're doing your common core model here, but in your hierarchy, are you actively looking toward the differences that might be involved in a youth learning pattern, for example, versus the the common uh adult model you're on right now, since there seemed to be some differences in synapse formation, there, oh.

A

Sure I mean there's a lot that goes on. We could talk for a long time here about uh how the synapses are formed and what what level how the connectivity is developed. You know we have a lot of advantages in software that the brain doesn't have so, for example, let's when we know that when uh when a mammal is born, especially human, there's, a dense over connectivity at birth in early life, and it gets pruned back very quickly um now you know we can speculate.

A

Why that is, we can say, look the neurons, don't know where they're supposed to connect they're trying to find. I didn't explain the mechanisms here, but they're, basically trying to find other cells that predict their own activity. Other cells that are active before they become active, that's the sequence memory and they don't know where to look for that right so and it couldn't- and we do know in the brain that there are certain directions. They do need to look if they're going to find the right pattern, and uh so you can start at birth.

A

You can have this over profundity of connections and then you say which ones are established, and then you forget the other ones right as an adult, it's harder to learn new things, because you don't have that ability. We can't a neuron doesn't say I need to go over there about a half a millimeter and find that cell. They can't do that. um They only been nearby. So we don't have to deal with that issue in our in our models. We can start off by saying, hey: look.

A

We can give a it's in softer and softer is really to do this stuff. We have this huge connectivity matrix. We still have topology the cell, that's still going to connect to some other cells nearby, but I can just essentially say it's like. You have synapses everywhere around here and you'll find the ones that you need to connect which actions to connect to where brains, real biologically, has to grow these things, and and this you know you can have at birth. You have one type of thing going on later in life.

A

We know that if you want to learn new things, you have to actually progressively get closer to that, so the dendrites and the accidents can grow. So it's a complex field. Your question relates to, but again from a technology point of view, we don't have to deal with that. We can just get back and say, okay, what you know we can skip that part. We don't have to say like at birth. It looks like this.

A

um We can just have a large connectivity matrix, that's very sparsely connected, it doesn't cost as much, and so we just have that all the time our systems are like brains at birth. They never have to prune back. They just have potential connections everywhere. Yes, guru.

F

Jeff, um this is a classic question. I'm sure you've heard many times before, so we invented flying machines that don't look like birds right. So why do you think computing machines are going to look like the brain and and more importantly, for me right now? What is an alternative architecture that you may have seen in your research that could get to a similar intelligence, school yeah.

A

I do hear this a lot, uh you know if you go back and look at the history of the wright brothers, this this is a misapplied analogy. That's like well, bunnies, don't flap their wings. The wright brothers knew they had to understand the principles of flight and they studied birds to understand the principle of flight. They knew that the principle of flight had to do with wing design.

A

They did wind tunnel tests, they did this. They knew they had to get the principle to flight. They knew that propulsion was something completely different. So airplanes share the principle of flight that birds share, but the principle of propulsion wasn't important that wasn't the thing they were trying to do. It doesn't matter if you have a propeller or a jet engine doesn't matter, but the principles of flight are the same and they knew that same thing applies here. The principles of intelligence are important. The actual implementations are not now. This could be consumed as con.

A

You know, conjecture or subjective. Why do I think these principles of fight? Are there other ones like it? My walking assumption is now I've been at this now for a long time. um Over 30 years, I've been working on this and you know I originally started by doing a literature search of ai and a larger search of linguists and literature, search of neural networks, and I just spent I read thousands of papers all these things and uh and I watched the world evolve. I watched artificial neural experts come by.

A

I observed the 1980s back propagation so on, and I kept saying you know what these guys aren't getting any closer and it seemed obvious to me that you ought to look at a brain if we want to build a cognitive system. What's the only example, we've got a brain now. Why would I look anywhere else now we remind some hubris and say well we're smart enough. We don't need to look at brains, we'll figure it out on our own. Well, maybe that's true that could have been true.

A

It doesn't appear to have happened as I haven't seen it happening. So then, when you go look in the brains, you find surprising principles. You find it's an amazing surprising thing that there are common architecture across all these different modalities. It's a and then we learn about sparse, distributed representations. Those are amazing. These are things no, I would have never thought of. I wouldn't have thought of as a hierarchy of similar regions. I would never guess that stuff. So uh at some point we can throw away the brain. We'll know enough.

A

We'll just do our own thing, but I that hasn't happened yet so and if you think you can do it some other way, that's great. I just don't know how and I don't know what it is. I've never seen anything else like it. uh So to me this is the way to go until we know better all right.

B

One more one more question: I don't have we'll get john, let's say one up here. This gentleman.

A

B

And then we'll finish.

G

What is this interesting talk, um and I agree with the question there, but I want to bring this to a higher level rather than at level neurons, and things like that, because one of the questions I have is you know, there's a lot of evidence of machine learning around sparse representation. It's not a new concept.

G

One of the questions I have for you is there's a lot of infrastructure that you're building here a lot of things. Is there any evidence that this is buying anything that doesn't currently exist? If you just through random forest, you know techniques, machine learning, techniques at the data. Would you get anything different? That's the first question, the second question or any better.

G

The second question is, you know. Fundamentally, the cognitive system works at a symbolic representation and there's clearly a grounding problem between data input and that grounding problem. You know we have one of the foremost researchers here. Ann treason, who's done work on this since the 80s of the binding problem being able to differentiate between different pieces of information and understand how they're different so the next time. I see you, despite that. Probably 90 of your the the optical input will be different. That is you're probably going to be wearing different clothes.

G

I still recognize you because I understand where the important information is, but, more importantly, I can tell you what's different between those two things I can say his shirt is different, not which pixels are different or which neurons are firing is different. So I want to make sure that as we're moving forward in this, I agree with the existence. Proof right. The human brain does things that nothing else can do so we have existence proof.

G

We should be looking at that, but I want to make sure that we're not focusing on the wing flapping and not lift yeah, and my feeling is- is this feels to me much more like wing flapping than it does emphasizing lift.

A

Okay, those are two very different questions, and um so I'm going to take them both of you said only one question, um so, let's get back to the first one: oh can we do a better job? Is this better than applying random forests or some other type of learning techniques? And the very beginning analogy I made here is: is that um you know I I'll never say it's better than some. You know. Take three phds stick in a room. Try to solve a problem. Can I be better than them?

A

I don't know, probably not the point. The argument of my analogy in the beginning is flexibility that drives platforms and um and what we found so far in our testing and other people's testing, not just us other people's testing is that these networks perform very quickly get up to parity or very close to parity of the best solutions out there, and then you people get into these benchmarks.

A

Where they're saying well, I can get three percent, one percent, half percent, better blah blah blah, but we got there in half an hour and they spent you know three months literally, that's what it's been like for us going through this also most of the existing machine. Learning techniques do not handle time very well at all, they're, just not about time-based patterns, and so it's hard to actually sometimes make equivalent comparisons on these things. But again it's really the flexibility that matters and- and that's the key there for that piece.

A

Now, the second one I actually don't believe the binding problem is a problem, and but I'll tell you this, though the sparse distributed representations answers that question about knowing what's important, what's different, what you have to get into this read the paper watch the video on this stuff.

A

It basically represents a distributed fashion, all the attributes of something semantically and you just really doing a bit comparison between patterns to understand what is semantically similar and what is semantically different and you could still recognize them as the same thing it really. It is the key to solving the representation problem in ai.

A

You know, there's an air researcher once came to him to the redmond science institute and he said to me he's just retired after a year in ai- and he said you know- I mean a lifetime in ai and he said um you know what this rep. The problem of representation is the biggest problem in ai and he goes no. It's the only problem in ai the problem of representation, and I didn't understand what he meant by it at the time. I now understand what he meant by it, and I now understand the solution to it.

A

It spars the steward representation, just I'll, have to leave it at that. Thank you very much.