Numenta Numenta Talks, 21 Nov 2014

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: What the Brain says about Machine Intelligence

Description

"What the Brain says about Machine Intelligence"

Jeff Hawkins
Co-founder, Numenta

21 Nov 2014

A

Hi, I'm jeff hawkins and my talk today is about machine intelligence, and the title of the talk is what the brain says about machine intelligence. So what nomento? That's what we do we're working on machine intelligence and we study the brain, and this talk is about how the brain informs us, how we might go about building intelligent machines.

A

Let me um start with an analogy: this goes back to the 1940s and in the 1940s we were witnessing the birth of programmable computers, and so at that time they were just starting to build computers, and uh it was very confusing time. There was a lot of there wasn't a lot of agreement about how to go about doing this so different approaches.

A

Some people are building dedicated machines that were designed to solve a particular problem, and they generally did a better job than people who are designing more universal machines, but they can be applied to many problems. Some people built analog computers because they were better at certain problems and other people built digital computers because they were better at certain problems. There was a debate between whether digital computers should be decimal or binary.

A

There was debates about how the programming should work and what kind of memory architectures there should be, and so this is a very confusing time, but, as you left the 1940s and we entered the 1950s, we settled on a single dominant paradigm for computing, the one we still use today and so that dominant paradigm. We have universal machines, they're not dedicated they're digital they're binary, they have a memory based programming, they have a two-tier memory system and so on, and this is what we built all of the computing uh for the last 70 years. Now.

A

Why did we settle on one paradigm? Why did one paradigm win one because of network effects? Network effects are, is a term that refers to when, when you have a leader in some area, more people invest in that leading solution and it gains momentum and it moves away from the other solutions, and so we tend to end up with a single solution or single type of paradigm in technology.

A

Now, why did this particular paradigm win? And the answer is because it was the most flexible and the most scalable it isn't. Computers are not the best solution for all problems. You can almost always design a better solution to a particular problem than programming computer, but they are they're universal, so we can apply them in to many many different problems extremely flexible.

A

They also can scale from the very small embedded in you know, an appliance to room size, computers and it's these attributes that led us to saw to choose one dominant paradigm and the one we did now today in the in the 2010s we're in a very similar period. We are witnessing the birth of machine intelligence right now and it's a confusing time just like it was in the 1940s, there's different ideas how this is going to play out.

A

These are some of the differences. Some people are working on a problem. They come up with a specific solution that says this is the best solution for this problem. Other people work on more universal solutions which are easier to apply, but may not be the best solution. There are different approaches. There are mathematical approaches, there's memory based learning approaches, different sort of fundamental ways to think about learning, there's different ways of training, batch versus online learning or labeled versus behavioral uh based learning.

A

So it's kind of a confusing point in time right now, but as we leave this decade, in fact, even before the 2020, I believe will have settled on one dominant paradigm and uh part of this talk is to argue for what that dominant paradigm is, and we are basing our argument essentially on the fact that the brain gives us an example for this. So what is that dominant paradigm? It's a universal algorithm for machine intelligence, it's not specific for different problems, but a universal one. It's memory based it's online learning, which means it learns continuously.

A

It's behavioral, based learning and so on. Now I'm going to talk about these details in my talk, but the question is: why are we going to settle on one paradigm, the same as before network effects once you start getting momentum behind something it'll, it'll snowball, why this particular paradigm same as before it is the most flexible.

A

We know that brains can solve all kinds of problems and in tremendous flexibility in the human brain in other brains, and it is scalable. We know from nature that you can build small brains and large brains. So how do we know this is going to happen? We have a proof case in biology, the brain, the neocortex, and we have made great progress in understanding how it works.

A

So this is not something that's going to happen 10 years from now. Something is happening right now and the rest of my talk is going to go in detail. What this is about. Okay, our company's mission is essentially who have two. We one of the first ones discover the operating principles of the neocortex. Now, just remind you, the near cortex is is a big wrinkly thing on top of your brain, it is about 75 of the volume of your human brain and it's the locus of all intelligence language, hearing, vision and so on.

A

Everything you could tell me about the world is stored in your near cortex. Our goal is to understand the principles of how it works. The second thing is we do: is we want to create technology, machine intelligence technologies based on those principles?

A

Now our goal is not to recreate a human or anything anything like a human or recreate anything like a particular brain. It's basically to say how do brains work, how's, the neurocryptics work. Let's build machines that work on those principles. It is not to pass the turing test or to build something. That's human-like all right.

A

My topics in my talk today, I'm going to give you start off with some cortical facts and some details about how the what do we know about the brain, the neurocortex I'll, give you a high-level description of the theory of how it works. I'll, tell you about our research roadmap, so you know where we are in this process and then we'll talk about applications.

A

What we can do today, what we're going to do in the future and then finally I'll end up with some thoughts on machine intelligence, all right, let's just jump right into what the cortex does. So this is a memory organ and it interfaces to the world through a set of sensors.

A

Your eyes, your ears, your skin, and, although we think of those as light sound and touch once you get beyond the side of the senses, it's just patterns, there's nerve fibers that are carrying information from the retina and the cochlea in the somatic senses into the neocortex, and those neurons are identical. They don't there's no distinction between the ones that representing light and sound and touch, and from the neocortex's point of view, there is no light, sound and touch it is just patterns, it is a pattern system and it treats them turns out.

A

Amazingly, it treats them all the same way. So this influence comes streaming into neocortex, rapidly changing over time and the neocortex has to build a model of the world from that fast. Changing sensory data, that's what it does when you're born your neocortex knows nothing about the world, but through exposure it learns how the world behaves and it builds a model of the world. This model is a predictive model, meaning it's constantly predicting. What's going to happen next, you're, not even consciously aware of this.

A

That is constantly predicting what you can see here and feel it's a predictive model. It can also tell when things are different anomalies, and it also generates all your high level. Behavior subsets of my speech right now is coming from my neocortex now, because the neocortex generates your all this behavior. When you, when you act in the world, you actually move your sensors in the world and you end up changing the inputs that are coming into the sensor.

A

So actually most of the changes that are occurring on your sensory organs are coming from the fact that you're moving you're moving your eyes, you're moving your head, you're turning your body, you're making, sounds, etc. So what we say is the neocortex learns a sensory motor model of the world. It learns how the sensory data changes when you act upon the world and from that we can do goal-oriented, behavior and all the things that humans do. So we want to know how it does this in detail.

A

So, let's just dive into the cortex a little bit more in detail, we'll start with a picture here. This is a human ear, cortex and right next to it is a rating of cortex. I show the rat neural cortex, because every principle I'm going to talk about today applies to all neurocortex, regardless of the animal. The only difference really between a rat in your cortex and a human near quartz is primarily the size of it and we'll talk about that in a second.

A

So, no matter what neocortex you're looking at it's always a sheet of cells, and it's about about two and a half millimeters thick. It's a pretty thin sheet of tissue of neurons and the difference again between a human and a rad is just how big that sheet is the area of it. Now it's remarkably uniform, you know in two ways: one is: if you look at the detail in this sheet of cells and there's a lot of detail there, we'll talk about in a second, it's remarkably uniform.

A

In fact, you know the rat and the mouse and the monkey and the dog and the cat and the humans that detail is remarkably the same they're slight differences, but almost almost all the same and it's functionally uniform.

A

What do we mean by that is that, even though there's different parts of the neocortex to do sight sound and touch it, it was been speculated for now over 30 years ago that the different parts of the aquatics are actually doing the same thing and what makes the visual part visual is because it's connected to patterns coming from the eyes and what makes the auditory part auditory is connected to patterns coming from the ears and there's a huge amount of evidence that basically verifies this and literally you can take in a newborn animal.

A

You can take the auditory nerve and the visual nerve and swap them and the auditory cortex becomes visual and the visual cortex becomes auditory. Okay, this sheet of cells is actually organized hierarchically. That is the different regions in it are projected to each other, and if you take that map you'll see that there's a hierarchy of regions- and so information comes in at the bottom of the hierarchy and moves up, and then it also flows back down. But all those regions look the same and doing something similar.

A

If we zoom in further into a particular layer, any layer anywhere any near cortex you'll see that the first level of detail under the microscope is layers of cells. There are multiple layers of cells. Typically, four depends on how you count them: we'll label them layers, two three: four: five and six four layers of cells.

A

Now, if you zoom in further you'll see that the neurons themselves, the cells are organizing, these things called mini columns and they're they're, really these very tightly little uh very close and very skinny little columns of cells with about 100 to 120 that span across all the layers.

A

If you zoom in further still you'll see the actual neurons- and this is a picture of a classic cortical neuron and the neuron is a characterized by this big sort of tree-like branching structure called the dendrites on it and they have all these connections. You've heard about called synapses so anywhere between three and ten thousand synapses. Ten thousand is probably closer to the typical number of connections on every single cell.

A

Now it turns out that most of those connections are close uh far from the cell body on these branches. Those are called the distal connections and only about ten percent are close to the cell body called proximal ones. If you zoom in further- and you make a picture, look at a picture of the of an actual dendrite one of those little branches that coming off here, you can actually see all the synapses alone and they're very tiny. Those are little spines all arranged along. Those are the acts of connections.

A

Now we now know something we didn't know even 20 years ago that these branches are active they're, not just parts of the cell body. They are. They are active processing elements and what they act, like is a little coincidence, detectors and they see if a bunch of synapses become active at the same time close together and close in time, then it actually has a large effect on the cell body.

A

Surprisingly, almost all neural networks, today, artificial neural networks and neural models do not account for all these synapses and these active dendrite properties, but this is essential part of neurons now. The other thing we've learned recently is people for many. Many years have thought that learning occurs by the strengthening and weakening of synapses.

A

We now know that new synapses form all the time and they disappear all the time quickly in the outer, sometimes minutes or even tens of seconds, and so uh learning that's a much more powerful part of learning and so learning actually is largely about the forming of new synapses, not just strengthening and wicking existing ones. So this picture here this, if you think about this, this is everything that we, this is a big picture of your brain and everything you do everything a human does.

A

Everything you've ever done in your life is operating in this structure. This is a picture of what intelligent structures in biology look like, and if we can understand how all these components work, we understand how the neural cortex works. So we have a theory about how this all works. It's a very high level theory. It starts. It's called hierarchical temporal memory, hierarchical temporal memory, htm, uh it's some very basic premises to it. First premise is that it's the neocortex is a hierarchy of identical reasons. That's pretty much a fact.

A

Okay, the next one is each region learns sequences. Now here is we start we're adding something that most people don't think about that. The memory- and this is all about memory- the memory in each of these regions is primarily a memory of time-based patterns. Time patterns are changing over time. It's like learning melodies and what happens is if a region can build a predictive model of a sequence. It then creates a more stable representation like the name of the melody in the next region up and then that learns sequences of sequences and so on.

A

As you go up the hierarchy, you see more stability. This is observed in brains. Similarly, you can take a pattern: that's fairly stable at the top of the hierarchy and unfold it in time, and you release the faster and faster changing patterns at the bottom. That's what my speech is right now, I'm taking high level concepts and unfolding the very fast changing patterns that make sounds okay, so that's the basic of hdm theory, but now we a lot. That brings a lot of questions exactly what are? What is the region doing exactly?

A

What are the different layers doing? What do the cells do this and how they implement this how's, the memory actually work, and so on. So let's dip into that a little bit deeper, deeper and give you a flavor where we are in understanding this? Okay. So here's a picture of the slice of neocortex of our four layers of cells, layers, two three, four: five and six. You can see some of the mini columns over here on the left. This is obviously a character drawing. This is not a real photograph.

A

Now two of these layers, the upper ones. Two three and four are essentially feed forward layers, information going up the hierarchy and layers. Five and six are essentially layers of information flowing down the hearkey. So when information and- and what we believe is going on here- is that each one of these layers is implementing a type of sequence memory, each one of them. So this is this is a very interesting idea. It's basically it's the same basic idea, repeated in different areas to do different things.

A

Now, when information comes in typically, this is classic neuroscience. It comes in and it goes to first to layer four and we think of this as sensory data like information from your eyes, your ears, your skin things like that. What most people don't know or don't remember, is that not only do we get sensor data, but we also get a copy of your own motor commands. So, as your body is moving, it generates these neural firings that make your muscles move, the literally those cells split into and send the same copy of.

A

What's actually going to your muscles is basically sent to the cortex, so the cortex gets to see not only what you're sensing, but also what behaviors you're actually implementing right. Now we believe layer four. Is it doing a type of inference or you could think of it as pattern recognition? But in france, it's doing sensory motor inference. It's trying to build a predictive model of what's going to come in based on the behaviors.

A

You just executed here's a picture on the right, the simplest way, understanding this is think about like a when you look at a face or get an image, your eyes are constantly moving several times a second three to five times a second, your eyes move over different parts, and yet your perception of the image is stable. It's not moving around you, don't you're, not even aware that your eyes are moving, and so how is it you create this stable representation? How is it that that you understand this pattern, while your eyes are moving? That's!

A

What's going on layer, four now notice the order in which you move your eyes over when you're looking at something is not the same. Every time it's not like a repeatable pattern. It's not like a melody. It changes all the time. So the only way you can actually make a prediction about what you're going to see next, what the brain is doing is to know exactly what you're seeing now and what motor behavior you're about to do.

A

So, if I say oh I'm up here and I move down to the down here, I'm going to see a nose that kind of thing that's what layer four is doing anything that layer four can't handle is passed on to layer, two three and layer, two three there's another type of inference, and this is what we call high order inference. This is really just like a like a melody.

A

It's a patterns that actually do repeat over time and that you can understand because they they do repeat over time, and so you can make a prediction what note's going to occur next in a melody, if you've heard the previous some number of notes- and you say- oh, I recognize this melody, and this is where I am I'm going to predict the next note. My speech is another example of a high order pattern.

A

Now, layer, 3, then becomes output and then projects up to the next higher region in the cortex layer, 5 is, is where motor behavior is generated in the cortex. So these cells there, they project someplace else below the brain below the excuse me below the cortex uh to sub-cortical motor centers and they generate behavior layer. Five cells are making my my sounds coming out of my vocal tract right now and and layer. Six is a um uh is an attentional mechanism, this projects back down to hierarchy.

A

Now before I go on, um I want to point out well, I have here in this text here each again. The point of the slide is each layer is doing a variation of sequence. Memory, you can think about motor behaviors, the type of sequences I'm playing back, and then we have two different types of inference and there's a common algorithm going on here, and the really important thing here is that these are universal functions. Everything I told you about here in a cortical region applies to any kind of modality any kind of sensory metabolism.

A

It's true for vision, it's clear for hearing it's true for touch. It's true for language. It's true for science, everything, the brain does. The cortex does can fit into this. These are very powerful universal ideas and that's what we want to find in the cortex, because it's a universal learning, algorithm.

A

Okay, let's just talk a little bit about. How does this sequence memory work? And you know I say the sequence remember going on in these layers? Well, exactly how does that work? We have a really really good idea how this is going on we're pretty certain we have this figured out. I won't go into great detail here, but I'll give you some of the flavors for it. We call this htm temporal memory, it's a pretty plain term.

A

uh Just so, it's a memory of time-based sequences and uh here's a picture of one of our our simulations. Those little cubes are represent. The neurons, we're modeling and the colors represent neurons that are active and so we're actually modeling. In this case one small layer of cells we're not modeling the whole quadratics, just one layer of cells at a time here, because this is the sequence memory now.

A

uh Let me just tell you the attributes of this and we won't tell you how to det how it works, although everything that I'm describing here is documented online and the code is available online, and so this is not a mystery. You can read about it, um but what does this thing does? Well, they learn sequences.

A

uh It recognizes and recalls sequences as new patterns come in. It says: hey. Is this something I've seen before and it's constantly predicting in fact it's making multiple predictions at the same time. So as any input comes and says, you know, these are all the things that might occur next now it does all three of these things simultaneously, not one after the other constantly saying, as the data comes in it's saying, I'm learning I'm recognizing I'm predicting I'm learning recognizing, particularly over and over again we've built a lot of these we've tested it.

A

We understand it mathematically. This is a well characterized system and I can just give you some of its attributes. It's extremely high capacity, so even a small section of cells- maybe you know, 100 000 neurons- can learn many millions of transitions over time. It is a distributed system and it uses local learning rules. So there's nobody in charge of the whole thing and one of the advantages of this. It makes the system extremely fault tolerant.

A

You can lose neurons and synapses and columns it's just it's amazing the robot bust it degrades gracefully in almost all situations. There are no sensitive parameters, it's not hard to get this stuff work and it actually generalizes. It can take new sequences and new patterns and say these are similar to ones I've seen before semantically and make predictions about new things in case you're wondering this is not a typical, artificial, neural network. You know there's a lot of talk these days about deep learning and other artificial neural networks.

A

This is very, very different from that, and just give you three attributes just to give you a clue as to how it's different than other type of artificial neural networks. One is. It pays a lot of adherence to cortical anatomy. We do this because we need to not because we just want to uh so. We have this idea of many columns and inhibitory cells and various connectivity patterns that you see in real brains that almost nobody else models.

A

The second thing- and this is worth an entire talk on itself- uh which you can see online- um it's built on a type of representation, called sparse, distributed representations and just to give you what that is, if you think about the neurons as being one being a bit, and you can say when that renurin is active, it's a one and a neuron's inactive to zero.

A

At any point in time, most of the neurons are inactive and only a small, very small percentage in their runs are active, and so we can represent that by a whole like a vector of a whole series of mostly zeros and few ones. It turns out that this type of representation, which encodes semantic meaning, has some amazing mathematical properties, which makes the whole thing work so well.

A

I won't go into in further other than say that our sparse distributed representations are essential for how the brain works and they're going to be essential for any machine intelligence. That's a given. Finally, the neurons we model here are unlike any kind of simple neurons. You see in typical artificial neural networks. They have active dendrites, they have thousands of synapses. We have we learn by synaptic formation, we're really getting closer to what real neurons are doing, and we understand why it has these features and they're essential.

A

Okay, I'm going to leave this for now and if you want to learn more, you can go to our website. Numenta.Com, learn, slash and there's videos and papers and details about everything here. Okay, let me just talk about a research roadmap. Well, how much of this do we understand and where are we and what are we doing with it? So, let's go back to our picture of a slice of near cortex.

A

What we're trying to do is work our way through this and understand exactly what's going on in each of these regions, these layers and how they interact and so on. We started with layer two three, because it's the easiest one, it's the simplest one architecturally, it's the simplest one to understand, um and so we started with that. I would say we understand it very well. I put the theory. Is at 98: there's always some things you don't you know you could get wrong, but we've extensively tested this.

A

We use it in commercial code, we know it works very well, we understand it. We have started working on the sensory motor inference layer. This is again understanding how to make predictions based on your own movements, and I would say the theory is about 80- done- we're well over the hump on this we're in development right of it right now. So this is what we're working on right now at momentum when it comes to motor sequences. This is how the cortex generates goal oriented behavior.

A

We haven't started working on that in detail yet, but we have a good piece chunk of the theory done I'd say about half of it. So we're to my mind. We have a lot of big big pieces in place and it's a matter of stitching them together and then testing those ideas, and we haven't done as much on the layer six, which is the tension, feedback layer, it's a more complex layer.

A

So that's that's a little bit further behind, since we started here with layer, two three, um we said: okay. If this really works, let's apply to real world problems and see if it works on real world problems. So that's what we did we said. Well, what can you do with this kind of high order inference memory using this htm temporal memory? Well, then, upper right here you can see. Well, we can work on streaming data.

A

We can work on problems where the data itself is changing over time, things that are coming in in a streaming format, and we can do prediction. We can do anomaly detection. We do go into classifications, there's lots of applications here in predictive maintenance, security, natural language processing, I'm going to show you some of those in a moment.

A

So how does it do so? We we actually built this, and how do you actually go about building a system? What does it really work like? So here's how you do it? Here's, how you build a streaming data application using htm theory. You have a data stream, you run it through something called an encoder. Now your data stream is like numbers or categories or data from a database, something like that. You run it through an encoder which turns the data into a sparse distributed representation.

A

That's like like a it's like a sensory organ and then we feed it into the htm temple memory and we get out predictions anomalies and classifications. Now there are lots of streaming data sources and it's not hard to find them every application, every server, biometrics, medical devices, vehicles, industrial equipment, communication networks, etc. All these are spewing out data. In fact, you know when people talk about big data, mostly they're talking about things like this things, that you can read off every few minutes, you're getting data points.

A

Now we have a number of different coders we created so there's, there's default encoders for numbers and categories date and time. We actually have some one for gps data and textual data now, and then we feed them into the htm and we get these outputs.

A

So we've done this and we've applied it to lots of different data sources to try it out. I'm going to give you a taste of all of these. We did several in the sort of the anomaly detection range here. We did something with server.

A

Metrics human metrics financial data and then some other ones with like medical stuff and gps and natural language I'll give you a flavor for each of these, I'm going to start with the first one here: server metric data we actually built a product called grot and it's it's available on the amazon, aws marketplace, and it's for monitoring servers. So here's how that work, you've got some servers running on a server farm and we take metrics from those servers. Now you can take multiple metrics. We do this automatically for you.

A

We run each of those through an encoder and we build a model for each metric. We experimented with combining the metrics before putting them into the model, but this actually works better. So we just built a model for each metric using the hgm and what you get out of it. The end you get something called an anomaly scores. How unusual is this data stream based on previous history, based on what it's been like in the past? How unusual is it going forward?

A

We display that in a dashboard, so here's an example of a dashboard where on a mobile device, where you're showing over time how anomalous a particular server is, and the height of the bar and the color there tells you how anomalous it is, and we order this and sort it by order. So you might be monitoring hundreds or thousands of servers, but we only show you the ones at the top of the list. That are the most unusual at any point in time.

A

This is continuously updated because the data is continuously coming in the system is continuously limiting. We also have a web interface, but I'll just stick to the to the mobile one for now, so here's some uh some kind of anomalies at the system attack and I'm going to just start with the easier ones and go to the more complicated ones.

A

I just need to describe the screen to you a little bit here and what you're seeing is at the top you're seeing the in the white area you're, seeing the anomaly score for the entire server over time at the bottom, in the gray you're, seeing the anomaly score for a particular metric, uh maybe cpu utilization or networking, something like that and then the black graph in the middle, with the blue line. That is the actual metric data. So when you get an anomaly you can go and look and see what's going on.

A

These are the screenshots to show you that so now, on the left. Here you can look to look at the data and you can see the detected anomaly. You can see the red tall bars there. This is a pretty simple one. The data was going along at one level and also jumped up to another level. This- and you said, okay, it's a sudden change. It detected that the next one over is one where there's more of a gradual change, basically the same idea, but it says you know what that's too much of a change.

A

I'm going to detect the normal. The third one over is one where we have a very regular data stream. It's it has a very repeatable pattern and when it's very repeatable, the hm can say even the slightest difference is statistically, very, very rare or unusual and detects an army. So in this case every hour has this little double double little spike in activity and one time one of those spikes is a little bit different. You can see it on the right there and that and grock and hdm gets it right away.

A

The third, the last one excuse me the fourth one on the right. There is where you have a very unpredictable data stream. It's noisy it's spiking all over the place. You can't predict it completely accurately, but there is a level of predictability and if that level, predictability change the html says it's still very unusual. So these would not work with thresholds. You can't just put some kind of threshold on these things and detect these here's where it gets really interesting, and we love we've seen a lot of these kind of anomalies.

A

These are ones that humans can't see. It's not at all obvious here's some data and you can see this is actually two screenshots from the same server at the same point in time and on two different metrics determined that both of those metrics had an anomaly.

A

And if you look at the data, the blue graph, you can't see why there's an anomaly there. It's not obvious to people to say well, I would have picked that as an anomalous point in the data, but the html said I got it, I'm certain of it and it doesn't report anomalies very often, it's very precise. So what happened there was.

A

This is a server that has an automated build-up process and an engineer went on one day and at this exact moment in time, started the build process manually very similar to what happens automatically but slightly different, and not only did the htm detect that it detected in two different metrics simultaneously independently and it's a very subtle type of thing. So this this could be very useful for uh security intrusion, detection things like that. This is the kind of the power that the htm has that we don't know of any other capability.

A

The system has this capability. Okay, we took that same basic idea and we said: can we apply to human matrix? You know people sitting at computers, typing away and accessing files. Can we tell if something's unusual, going on on a particular person's computer, and the answer is we? Can uh we put these like keystrokes and file access and we can detect when people do unusual things? In this case, someone created a very large zip file. They hadn't done before. If they did that, every day it wouldn't be anonymous.

A

We then we're actually building a product right now, that's built on detecting anomalies and company data and we're looking at both stock volume, data and social media aspects of that company, and we can actually look at the stock volume of a company and we can detect unusual patterns in it and we can do the same thing with twitter. So that's something we're in process of doing right. Now, then here's some other applications are quite different.

A

This is uh one that was done by a researcher at berkeley and they're, trying to take eeg readings off a scalp, and you and decipher them do classification on those edg readings to control things like prosthetic arms or in this case quadcopters- and this is a they got. Initial results were very promising. I don't want to overplay this because it was just um it hasn't been very far, but it's the kind of problem that the hm should be good at you're.

A

Getting these complex temple data streams coming off the the sensor, and we should be able to model that and classify it and so early results. Look good, uh here's another one where uh this is a company in europe and amsterdam that is, building an application to detect anomalous behaviors in ships moving through harbors, so ships come in and out of harbors all the time. The idea is here: we don't know what it's supposed to be like, but they just the html models, these behaviors using gps data and it detects one.

A

A ship is behaving unusual, whether it's going too fast or too slow or turning in the wrong place, or something like that. Very good for security applications and then I'll finally end up on natural language. This is work that we did in conjunction with a company called cortical. I o they're in austria and cortical io said hey. They like this theory that we're working on and they um they develop a tool which is really cool.

A

It's it can take a word and create an sdr or sparse distributed representation from it and that's what these pictures on the right are representing. These are those are sdrs of words and those dots are the one bits and the white areas the zero bits. Now they did this by feeding it's a complex technique.

A

They did it, but they basically train the system on like a corpus of text like wikipedia and once they've done, that you can give it any word, and it gives you back an sdr and it has all the right properties that sdrs are supposed to have, which I didn't get into in this talk. But what you can do is something really interesting. You can take the sdr for a word like apple and the sdr, for a word like fruit and now apple could mean multiple things.

A

Apple can be the fruit, it could be a computer company, it could be music company and other things, and so, if you subtract out the bits that are on in fruit that are also on an apple, you end up with bits that are essentially semantic, meaning of apple minus fruit.

A

This is the power of these, these representations of the brain jews and when the answer you get is, if you ask what is apple minus fruit, you get a new sdr. The most closest pattern is to computer, and so- and it says, okay, that's what that is, and the next one's down the list would be macintosh, microsoft, max lens and so on.

A

We can then take these sdrs and feed them into the htm, a sequence of words, just like a sequence of data coming off a server, a sequence of words going into the hdm, and so we trained a system on this. We started there's a very first simple test. We trained on three word sentences that it was an animal, either eats or likes something. So elephant likes water and elephant eats grass and we didn't train on a lot of sentences. It's just a 50 or 60.

A

Something like that, and so what what we can then do is we can say: okay after we trained it on this series of words, the sentence is we can ask it a question, we can say: well what does the fox eat and the way we do that is we feed in the word fox, the word eats and hcm makes a prediction now. The word fox had never been seen before by the system.

A

um It's it's something that we have an sdr for it, but the html was never trained on it that that pattern, and but the word fox, would share semantic similarity to other animals, and so we can then basically feed in a new animal and say what do we think the fox eats and the answer you get out. You get an sdr, you look it up. The ansi gap is rodent, and this is really incredible. It's it's showing a level of generalization.

A

uh That's pretty cool and it's working on the same principles that your brain works on. This is a very simple example, but it's the same basic principles that are going on in your brain when you're understanding language, we think there's a lot of applications for this. Of course, the whole thing is unsupervised, it does semantic generalization, it can work across multiple languages and we think there are applications in search, sentiment, analysis and so on. So this is a whole area of research. That's going to be built around htms. Okay. Now I just showed you quickly.

A

Six of these applications- and the point I want to make about them here- is that every one of these was was built using the exact same code base, not similar, not a reconfiguration, not a recompilation, the exact same code base. We literally took data from these different sources, fed them through an encoder, turned them into sdr and fed them within the exact same htm model, and of course it learned different things in each case and and we got useful results in all of them. This is really getting to the universal nature of cortical algorithms.

A

There's no other machine learning technology that can could do this, but brains in nature figured out how to do this. Okay, let's go back to our research roadmap. We start with. I just talked about how we modeled layer, three, which is this high order, inference and the applications we can build with that.

A

We're now working on the sensory motor inferences again trying to model how the brain understands the world when you move in the world- um and we are now- and we can ask ourselves we're in the process of doing that, but we can answer those. What kind of applications could we build using that technique and there's lots of them? So essentially, this now works on static data.

A

Instead of streaming data, it would be static data and yet we're moving through the data we actually move like it's like you're, looking at a picture and you're moving your eyes over it. Now it's static data, but it's with active learning and you can do classification and you can do prediction we're working on vision right now, because it's a classic problem, but I actually think the more interesting applications will be non-vision. There will be things like network classification or anywhere you're, trying to understand uh and and recognize or or predict complex, connected graph data.

A

I think it's going to be really cool out here. Eventually, we're going to go on to not too far in the future, we're going to go on to working on layer, 5 and motor sequences, and now this is where it really gets interesting, because now you can start adding a goal or any behavior. So the system not only can understand what happens when you behave but actually starts directing behavior, which is what we think about when we think about human level.

A

Behavior and, of course, that introduces the idea of robotics and and other things, but I don't think most of the applications are going to be in physical robotics. I think they're going to be in sort of more virtual worlds, where we have agents that are moving through data in an intelligent way or trying to do proactive, defense and so on. The same principles can be applied to lots of different things: okay and then finally, layer six allows you to build bigger hierarchies in multi-century modalities.

A

That's when we really want to scale this up, just a word about how we go about our research, we're very transparent everything we do is documented and available to inspection and open source. So all our algorithms are documented. uh The documentation could be better, but it's sufficient because we have many independent implementations around the world, so other people have built this off and it works for them too. We have an open source project called new pic.

A

You can find it at nomento.org and all of our software is open source there and we actually put even our daily research code if we're working on things and experimenting and building hacky code as we go along, we put that up there as well. There's active discussion groups for theories and implementations, and so on.

A

We also have collaborations we have a collaboration with ibm's research group in amaden, san jose, california, and- um and we also, we have a cooperating with darpa in washington, d.c who's, trying to do a project on building hardware for cortical, html algorithms, and then little companies like cortical. I o, which I mentioned with the words this is just a a chart about some of the metrics at our open source community. We created this about 15 months ago and it's been growing steadily since so we're very pleased about this.

A

It's a it's a good active community and I would encourage anyone to join in at whatever level you can. I'm going to end with this slide here. uh This is sort of my my uh the way. I view the machine intelligence landscape. What's going on right now, and there are. This is a rather simplistic perspective, but I'll offer it. Anyway. I see there's three basic um ways: people think about machine intelligence on the left.

A

You have people who are interested in brains, cortical modeling, that's what we do and I would argue that the furthest advanced technology in this area is htms. So that's a good example there, the cortical modeling, we say we're going to understand how the brain works. Then we have in the middle. We have artificial neural networks, part of the machine learning community and this the premise there is that it's really mathematically based it's not biological, based at all.

A

In fact, artificial neural networks have almost nothing to do with neurons and and brains, or anything like that, but they're they're, mathematically, derived uh features and and uh systems and the the prominent one these days is deep learning. So uh you hear that in the news a lot deep learning is just a type of artificial neural network, very mathematically, oriented and so on. Then we have the more classic ai. Where these are this good example, this would be watson which is ibm's a recent, a really impressive machine.

A

So again, the premise under these is different for each one in the cortical modeling, it's biological in the artificial neural networks, it's mathematical and I would say in the ai world, it's sort of an engineered solution like hey, whatever works, to get this to work, we're going to do it they're all valuable, I'm not trying to put value judgment on them. uh They do different things. They work on different types of data, so the htms we work on spatial temple data.

A

That's I already showed that here uh we showed we can actually do some language stuff. We're gonna be doing more than that and we're starting to integrate behavioral data, meaning the data from your own body, artificial neural networks or deep learning is primarily spatial data. That's what they do today, but they're working on adding temporal. So I put that in gray there, because that's an extra, that's a dimension, they're working on and then watson is really all about language and documents.

A

That's the that's the data type it works with. They do slightly different things. Overlapping things. uh Htms are really good at classification and prediction and we're starting there. We have a path to go on behavior deep learning, networks are primarily classification, networks and um watson is sort of a natural language query type of system.

A

Now the main point of the slide is to say well, which are these on the path machine intelligence which one of these are going to actually take us out of this decade in into the to the next 100 years of computing, which is about machine intelligence.

A

I'm going to start on the right here, um I'm going to say for watson and ai, probably not now before anyone gets mad at me. This is the actual self-assessment of the people who built watson. They were asked this question. Do you think this is on the path, machine, intelligence and they said no they'd have to do a lot of things to get that to sort of be a more general purpose, something beyond natural language. It can be very, very useful for what it does, but is it on a path to building truly machine intelligence?

A

I probably not on the deep learning side, I would say: probably not there as well, they have to. They have to start incorporating time. They have to have a understanding how you can implement behavior attention, etc. So they have a lot of things that they're not doing today, but they could get there if they really start taking those other features seriously, um and this is not like a you know. This is not it right or wrong. It's just like. What do you need to do to get the machine intelligence?

A

I would argue that the htms are on the path machine intelligence. I've shown you an outline of our research agenda, how it fits with cortical algorithms, the process that you're going to do from pure sensory to sensory, motor and motor behavior and so on, and we have. We have a roadmap to get there.

A

I showed a road map to get the machine intelligence here, and so I believe that that is going to be the fastest path by far to get there and that's the end of my talk, it's an exciting time to be working in this field. It is it is that crazy time when there's lots of ideas- and it's not very clear to everybody, just like the 1940s worth of computing, but it's going to be only a few years now before this settles out.

A

I don't think it's going to be five or ten years it's going to be two to four years before it really becomes clear which are the right paradigms here and which are the winning solutions which are the ones we're going to carry forward for the next 50 to 100 years of computing. Thank you very much.