Numenta Numenta Talks, 15 Apr 2014

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Deep Learning London Meetup: Brains, Data, Machine Intelligence & Cortical Learning with Jeff Haw...

Description

This is a live-stream of an upcoming Meetup for the Deep Learning London Meetup group. Jeff will be presenting from the Numenta offices in Redwood City to an audience in London. See the details here: http://www.meetup.com/Deep-Learning-London/events/171081112/

A

Very much for showing up if there's been, we've really had a lot of interest. In this event, it's been amazing, seeing everybody sign up, but it's amazing seeing everybody turn up. So you know. Thank you very much. Thank you very much for skills Metro for hosting us for and stream, for, recording this and, of course, a big thank you to to Jeff and his team for agreeing to do this. So you know this is organized as part of the the London deep learning meetup, which Ali and myself have been the organizers.

A

Both of us represent a percentile, which is a start, and you start up here in London, offering machine learning, data, science courses and training and yeah. This is only our second Meetup. First one was last month hands up who went to the first one ah nice so.

B

C

A

Was a really really great Meetup great talk by Porter from Microsoft or giving a nice intro you're, probably getting back at some point. There's lots more interesting, meetups scheduled I! Think after this is done, we can maybe give you if you taste anyway.

D

A

A

Back to the meetup, so today I mean we're really honored to have Jeff Hawkins come present to us, especially for us live from California, and maybe we can convince him to come down too and then next time. So you know many of them. Many of you already know his name, but I'll just go through his bio, so Jeff Hawkins is an engineer. Serial entrepreneur, scientists, inventor and author he's founder of two mobile computing companies palm and handspring and was the architect of many computing products such as the palm power, Palm, Pilots and trio.

A

Smartphone I, think everybody's heard of those so throughout his life. Jeff has also always had a deep interest in neuroscience and theories of the neocortex in 2002. He founded the Redwood Neuroscience Institute.

A

A scientific Institute, too focused on understanding how the near neocortex processes information, so Institute is now still there and it's located at UC Berkeley in 2004, he wrote the book on intelligence, which some of you may have read, which describes progress on understanding the neocortex and in 2005 he founded grok, which was formerly known as Numenta a startup company building a technology based on neocortical theory. So the hope is that grok will play a catalytic role in the emerging field of machine intelligence and I. Think so far it's doing pretty well.

A

Jeff originally has earned his bachelor's in engineering electrical engineering from Cornell and he was elected to the National Academy of Engineering 2003. So everybody again, thank you very much for coming. Thank you for this matter and everywhere hosting, and thank you very much for Jeff for willing to talk to us and over to you, oh yeah, by the way. What just one more thing we all take, questions at the end there'll be a microphone that goes around.

A

Thank you very much. All.

E

Right we all said hi, I'm, Jeff I am sorry I'm, not there in London with you in person, I do get over there. Occasionally I have family in in Britain, but I'm, not there today. Unfortunately, so, hopefully it's going to work out right well and we have I trust, Olli and Matt here on the a/v side. If something goes screwing weird, hopefully they'll.

E

Let us know what we can fix that so I'm going to talk about the work we're doing in our company and in our open source project and and as always said, that we'll take two questions at the end and so we'll just so, hopefully, you'll go on go well. I, unfortunately, cannot see you very clearly and I can't judge how things are going fast or slow, or you want to get more information, less information on a particular topic.

E

So if there's something really going wrong and you really feel like I'm missing something well, let all you know and I'll be happy to try to go and address them all right. So, let's start actually Ali was cause. We all right. We started this company, the name Numenta. We briefly changed it to grok, but we're actually back to new mentis I'm. Sorry about the confusion of that the name of a company is momentum, and our mission is to be a catalyst for intelligence. So you know the catalyst is something that speeds reaction.

E

That's happening anyway, really slowly so we're gonna build and tell the machines we as a society are impelled machines. I have a different view, the most people, what those will look like, but we're gonna, build them and tell the machines in the meta it's just trying to accelerate that it could be a positive force and that in that transition that's kind of your curve. We do three things at Numenta. We have a research group where we study neocortical theory and we do algorithms development.

E

Then we have an open source community called nupoc and all of our algorithms are in the open source repository and this I suppose today's be up as part of that open source effort and then.

D

We do have a product group.

E

Called grok or product called rock, which is less than a month, the hold in the market and this product is using our cortical algorithms and we've applied into streaming analytics. So I'll talk about the research Fitness talk, I'll talk about preocupa, to show you what we're doing with these cortical algorithms and then we'll end up a little discussion about new pic. So that's my email address and don't be afraid to email me. If, if you need to I, try to stay involved.

F

E

I also have an active in the nupoc email list as well: ok, I'm going to start off with a story, and this is the story about Bill Gates when Bill Gates was still CEO of Microsoft a number of years ago. He was speaking to a group of young students. What we call grade school and one of those students asked Bill Gates, he said: do you think it would ever be possible to build the company has largest? You know as Microsoft another company's largest Microsoft?

E

Of course this was before Google, but Bill had a very quick answer. He didn't hesitate. He said yes and he said if you could build a company that you can invent a breakthrough, so computers that look can learn. That is worth 10 Microsoft's and when he was saying, I was a very smart answer. He was saying basically, we've built computers on the same principles for 70 years.

E

These are the principles laid down by by norming and Turing programming principles, and he said they know, computers really don't, learn and and telling you how to make machines that learn. You take actually a bigger revolution that and I agree with that. I thought very soup thing. He said it right over right away um and tells all my computer's that learn. I talk about machine intelligence, I prefer that term they're kind of the same, but one has a part of vision to it, and a lot of people are becoming interested in machine intelligence.

E

These days they happen for a long time, but there's a lot of people moving this direction and and I've been to lots of conferences from from computer manufacturers from applications, racial people trying to figure out people doing big data. You know we need machines that can learn that can adapt, and so on now, if you're gonna build machines, machine intelligence, you might ask a couple questions. What is what are the principles we can use to build intelligent machines?

E

You know how we gonna do that one of the houses can be structures and a lot of disagreement about this. The second question you might ask is what applications will drive adoption in the near and the short term and now and then. Finally, we, you know my answers to these questions. Well, you know my isn't. The first one I'm interested in brains and I believe that machine intelligence, it's going to be built on the principles of the neocortex. Now, just to remind you, the neocortex is about 80% of the volume of a human brain.

E

It's it's. What makes you intelligent all my language, your language, while high level vision, planning, motor, behavior and so on, is in the neocortex and I believe. That's. We need to understand those principles before we can build machines that are intelligent. The goal, my goal and the goal of events is not to build machines that are like humans. It's not to replicate the human, your cortex, if you understand the principles by which it works and then apply it to other problems that are maybe not human-like at all.

E

So if this is not they, you know the bull of robot company. There are human-like robots. This is a company to understand how the brain works and then build the telogen machines that work on those same principles. So a couple reasons that that recommend the brain for this. Why we should think about your cortex one? Is it's an incredibly flexible organ when you're born? You know, you know very little about the world.

E

Almost nothing you're near cortex is structure, but has no knowledge about the world and, and it's extremely flexible you can learn to program, computers, design, computers. You can learn to drive a car as in submarines and airplanes. You can learn spoken language, anyone, thousands language, mathematics in in physics and so on, and and so so we're gonna try switching back and forth through a little bit. So it's an incredibly flexible tool. These are things you can learn to do that. You were never evolved to do we're under evolutionary pressure to do so.

E

It's as close as we know, to a universal learning machine in the same way that Turing talked about universal computing. The cortex is the closer we know to the universal learning unit. It's not proven mathematically, that's the case, but that's one thing. The same thing is that the neocortex is varied or robust. It's built a very simple elements that are slow and unreliable, so the neurons and the synapses in the brain are fairly unreliable elements. None of them work particularly well yeah together and we have a very, very robust system.

E

There are no single points of failure, and so, as we think about going forward in the world in terms of building, you know new computer hardware, the April 4 machine intelligence. This is a very desirable property. People can build memories and so on that are naturally fault on and then, but there's still quite a few people who don't believe that machine tells us is going to be built on the principles that the brain, your people just do not care about the bay and, in fact, I think it's mine.

E

You might still be a minority opinion, but I can say the following: if we do have a neocortex work, then there would be a race to build them if we had a theory written down, but exactly how the neocortex works, I, don't think anyone would be sitting around arguing about it. We'd be off building the things and horsley.

E

That's where we starting to be we're starting to really deeply understand how the New York cortex works and we're starting to build them and I think this debate about whether brains are relevant or not, will disappear in the coming years. Ok, so now we're going to talk into some neuroscience, just to tell you a bit more how brains, work and here's a here's, a overall picture of what the cortex does. It receives information from your senses and the read that is really like an array of senses: it's not one sense.

E

It's a million sensors, the cochlea in your ears about 30,000 sensors, each and your somatic sensors with your body. Sensors, are about another million, so you've got a couple of million steps of bits coming into the brain they're streaming in there.

E

It's not batch stuff in any way and and the cortex as I said, starts talking early, not really knowing anything, and he has to learn a model of the world so from the sensory stream. It builds a model of the world and from that model that makes predictions that detects changes or anomalies and it takes actions and because it's taking actions you most of the changes that occur on your sensory organs. Most of the changes that are coming in on your data, your sensory streams or because of your own behavior.

E

So every time you move your eyes, which is three to five times a second. You can plead to innovation, coming on on your optic nerve and when you, when you feel things, it's your own body, moving through space and touching things, and so on. So a good portion of a vast majority and the changes on your data. Your sensors are coming from the your own behavior as you move and manipulate the world.

E

So what we say is that the cortex learns a sensory motor model of the world and what we want to know is: how does it do that? What does it look like? What does that model look like and how is it learned- and we know a lot about this now? Okay, let's start off with the basic high-level theory about what the cortex does, and this is called high. We call it hierarchical temporal memory or HTM and the three terms in write descriptive. The first is that it's a hierarchy.

E

So when you look at the neocortex, it's actually a sheet of cells, it's about two and a half millimeters thick, it's about a thousand square centimetre, so you can imagine it's very thin sheet. It's like a large dinner napkin er, so we had 40 guys coming and but that she is divided into regions and those regions are connected together in a hierarchy. This is very well-documented and the amazing.

C

Thing about this.

E

This, the cortical sheet is no no, where you look now at what region where you are in the hierarchy and what information is receiving. They look nearly identical, in fact, these regions, like so they're common across species and modalities, that you know the vision, visionary and human and a mouse of almost a semicolon in the urine brain higher levels of motor, almost identical, and it's been known for over 30 years, that everywhere in the neocortex, is basically implementing the same learning out. There's variations on a theme, but basically same thing.

E

This is a wonderful discovery that, if you understand you have one area, the new, your cortex works, we're going to understand how most of the neocortex works and there's nothing vision about the visual aspect. So it's nothing auditory, but the audience why aspects of neocortex that the brain hears and sees using the same methods.

E

The second part of that is the reason that the memory in the cortex- and this really is a memory system, is mostly sequence mode, and you might be surprised by that, but I think a Sigma scrum. You can think of like a memory of a melody or something like that most inference is we rely to on sequence memory, so imagine my voice right now, you're understanding what I'm saying the pattern during time matter. What pattern follows what in time is very, very important and I mix up the order? It would be different.

E

The same is true of touch when you touch things your hands move in a particular pattern over surfaces and so on. As you move through the world, these are sequences envision. Most people are confused about vision. They think invasion. Has a spatial inference problem like there's a picture in front of you. That's not true the as I mentioned earlier. They, the input from the eyes, is changing three to five times a second.

E

Every time your eyes move a saccade and your head's moving and can move through the world and things of the world are moving, and so vision, too, is mostly an inference, a temporal inference problem and, finally, motor behavior, which is high-level motor, be able to generate by part. The cortex is also time-based pattern. So my speech, which right now has been generated by my neocortex, is a very complex pattern of muscle innovations. That's going on to compete for forty-five minutes or so and I'm playing back patterns that I've stored in the past.

E

The words the phrases, the ideas- these are all things I've learned and I- can play them back through a memory recall process.

E

The final part about a hierarchical temporal memory is that as you as information enters the hierarchy so from the bottom in this case- and it goes up region, the region, the hierarchy, you see, stability, meaning the cells that are at the bottom level of changing fairly rapidly and then, if you go up to higher and higher levels of cells, a more stable and what we we think is going on there is that this, the sequence is: if they are correctly predicted, if they're understood, then the brain makes a stable representation of it.

E

It's like the name of momentum, and so would you pass up to the next his name's of little.

D

Melodies and the next reason was.

E

Learning their sequences of sequences and sequences so on the hierarchy, and then you come back down on our key sequences were unfolded so I didn't, say: oh, give a brain talk about this and I unfold. This very complex pattern: I go back down the hierarchy, that's the overall theory about how the cortex works. It's a very simplistic view of it, but it's I think it's correct! Now we can. We can jump in a little bit further we're going to dive a little bit more.

E

You know a little more theory, and so imagine you have a sheet of the cortex and here's you're. Looking at any area. It doesn't really matter what area it's about two and a half millimeters thick and then, if you jump in one level deeper and I zoom into one spot, you go back to this one I go one level. It looks like Matt's in control here, giving them crazy directions.

E

You jump down. One more you'll, see that the cortex did matter where you look at matter. What species of mammal? No matter where you look in the cortex you're going to see layers of cells and the number of layers is not that important, but basically there's four layers of cells with label. Two three: four: five and six- and you see this- every room, you jump in one more level of detail.

E

You zoom in a little bit further you'll see the second organizing principle, and this is steeper we're going to go for a while here and that's that the cells are arranged in these little columns. These little mini columns, I call them so you've got. So all the cells are vertically itself packed in these very tiny, vertical columns across the layers, and so there might be 100 cells in a mini column. So this is the basic organization. This is the structure how the brain works.

E

All your memories are stored in this kind of structure, and the question is what's going on here, so we have a. We have a theory about this. We believe that each layer in the neocortex is learning his learning sequences of patterns and they're doing it for different purposes under different conditions, but it was all about sequence, memory and so the layer, four and their three are both inference layers and layer. Five is the motor output layer and letter. Six is the attention layer.

E

We have studied this quite a bit and we think we know in detail how these layers three works and it's very similar to the other layers, and we call this the cortical learning outcome or CLA. It's basically a model or a theory about how a layer of cells in the neocortex learns sequences of patterns and does prediction and inference with them. They said it's a basic learning algorithm and we've been testing this for quite some time. We have a necessary deal with every day, so we're pretty confident we understand to a fairly deep level.

E

What's exactly going on here now, it's important to understand the CLA is not just another neural network. Some people say artificial, neural network, not really. No, it's a cortical model, it's got neurons in it, but the neurons are, unlike anything, you've ever seen before. I'm gonna tell you a bit about them. They're like real neurons, I'm, not artificial at all in terms of how they operate, and so you just have.

C

To you have to understand, there's a little bit more depth.

E

To architecture than you would in a typical, artificial neural network that you might be familiar with, and I'm gonna give you some flavor for that, and I'm gonna go a little deep here for a while and I'll come back out and make it easier moments. Hopefully, I won't lose everybody here. So let's just talk about these layers. A little bit more I've already mentioned this briefly, but the way this works in the real brain.

E

You know by the way, if we understand what I'm showing you on the screen, you're gonna understand a hell of a lot about how brains work. This is really getting at the core of what all intelligence is built upon. This is it.

G

There's nothing else going.

E

On in the cortex, so in later, four is a basic input layer and it gets your sensory data. The terms neurosciences use is called afferents, which this means feed-forward data, and so it gets a copy of you know, what's going on in your senses and somewhere else in the brain, but it also gets a copy of motor commands, meaning there are parts of your body that generate your behavior. The cortex just controls them, but it gets a copy of what it's going on. So when your eyes.

D

Move the cortex.

E

Gets a copy of the motor command that made your eyes move, so cortex knows what behaviors you're performing, and it also knows what you're sensing sensory motor and what we believe is going on is that layer four cells learns essentially motor transitions. What do I mean by that? You can't think about a saccade in the vision, so you here's a face and, as you look at the face, your eyes will saccade over the different part, look at the hair, the eyes and the nose and the mouth you don't realize your eyes are doing this.

E

Your perception is stable. You look at you just see this face, but the reality is the input coming into your brain is changing dramatically completely. Every few you know every few hundred milliseconds now what layer four tries to do is predict what you're going to see next or what you're going to hear next, what you're going to, and it's going to do that based on your own behavior. So in this case the seaweed go back a second. In this case, the order in which I look at a face is not fixed. Sometimes I'll go ayano's.

E

Now sometimes I go here, a hair or whatever it's not on my order, sequence, its, but if I knew what behavior you're going to do and I knew what you're looking at now I can predict what you're going to see next and that's what's going on later. For now, what happens is if layer four can make a correct prediction about what's going to connect, it creates a stable pattern in the next layer in the cortex, which is two three.

E

This is the standard neuroscience about how information flows the cortex and if it can't predict, what's going to happen, it said she says: I have no idea. What's going on here on can't model this, it passes the changes through. So this cut back to that temple of stability I talked about earlier. The basic idea is that layer, two three are learning high order transitions. A high order transition is Markham melody.

E

Here's an example: here's some patterns to patterns, ABCD and xB, see why and if I showed you just a me if I train the system on this and I showed you ABC, it should predict D and if I showed you XB c, it should predict Y, that's a high order sequence, because I can't just use the previous state to know what to predict next, and this is the way the world is structured by the way.

E

So this is the problem that brain has to solve, and language and music and motion and hearing, and you name it everything vision. These are the two major ways of doing inference and you see these everywhere and then your cortex, then in three projection, X higher region. The process repeats the I'm going to argue that these two basic ideas of inference for sensory motor transitions and for high order transitions or, if there's nothing else, these these apply to every modality to apply to vision. Hearing touch doesn't really matter.

E

These are universal inference steps and they apply to any. If you've got some sensory organs and you have some behavior, this would work for them. The other two layers which are layer 5, it's where the motor generation is created in the brain, the cells they're actually projected area of the your body that generate behavior and layer 6 of attention. What I'm going to claim here is that we understand layer, 2, 3, very well, we've been building these four years.

E

We tested them in our commercial product, we're starting to understand later for pretty well and what I say: 90% understood, I'm, saying I am 90% of know what I need to do to build this in software and test it. It's not look interesting, 90% of all the biology we're talking about actually building something practical with it. On the motor side, he's only got a 50% on the attention side. It's even less about 10% I have another talk online.

E

You could listen to about motors behavior if that's something of interest to you, but we're making progress on these, and these are all based on the CLA and one way or the other. Okay.

D

So we're going to jump.

E

In a little bit further and we're going to dive a little bit deeper into the CLA and how it works and and then we'll then we'll take a breather, come back and talk about some easy stuff. But.

F

You know this is.

E

C

E

State of the world right now, you kind of need to know this stuff. If you're going to work on our open source project. Ok, let's jump back in there. I'll bring up this slide again he's trying to do it. Ok I already said that the the cournot learning island is the way we first implement. It is, is really modeling high order, sequences of high order transitions, and so what's going on there before I, can tell you exactly how it works.

E

I want to give you a little more things to talk, think about I'm gonna talk about sparse, stupid representations and I'm talking about not what they look like and then we'll put it all together and show you how this thing works. So, let's just dump the sparse distributed representations. These are the language of the brain in STRs as we call them or how the neuroscience works, how the brains work and it's not an option.

E

This is not some like thing you could it's just there for the biology, it's part of the theory about how all information runs. So how do we understand? But the easiest way to understand is. First contrast it to what we do in computers in computers, we do what are called dense representations, so a dense representation is like a buying or a words, might take 8 32 64 bits which tends because we use all combinations of ones and zeros, so we use all all possible representations.

E

An example is the ASCII code, so there's the eighth letter, 8-bit character for the letter, M notice, in something like in the representation like this bits themselves, don't mean anything then obviously with the third bit, and they ASCII code means there's no meaning and back to my change. One bit I have a completely different representation, and so these are some sense and arbitrarily assigned representations, and this is pretty much what we use in computers all the time they the computer itself. We have to assign meaning to it.

E

The computer itself doesn't know what these things mean, and the range is quite different in a brain. Now, when I talk about bits, you can think it was ones and zeroes you can also think of missiles. So if I say I have thousands of bits I'm talking about thousands of neurons, if I say most of them are zeroes I mean most of them are inactive, so we typically see an SDR.

E

You need at least several thousand bits to do something, and most of them are walkins, meaning most of the cells I'm sitting very fewer ones, and mostly zero. So in the brain, what we see always is very few cells are active and very very few cells are relatively active. Most of the cells are relatively inactive. We'll use this example of 2,000 bits, SDR and we'll say: 2% of the bits are active. Now we're always gonna have 2% of its active. That's the statum, that's what an SDR is. It's only sparse.

E

It always has a certain amount set. Its active, so I might have 41 bits in 1960 zero bits now. The difference here is that the bits means something they have semantic meaning you can actually say what each pit means where you potentially could say. We need to pick names. This is learned in the brain: it's not a sign, but through woman we can just get so much stable, and so what might a bit.

D

Be 12 I was going to.

E

Craft the representation for letters which I wouldn't do, but if I did I could say well, one bit would mean this is the vowel, in other words a consonant and another one says it sounds like an e I or o ruh sound another one has its soft or fricative sound I can have best describe how the letter is written, there's no too close or open. Does it have a sender's with these senders? I get Vince is saying where it is in the alphabet.

E

What's next to it and so on and and what I would do is I want the grep a letter I picked the top 40 attributes that match that letter and and if I pick another letter, I picked the top four, the attributes that match that letter and that's how the brain forms representations. Now this has some really great properties, and so let's go through a few of them, and this is the key fact you know I'll tell you now.

E

If you want to remember only one thing for my talk about the future of intelligent machines, you know tissue. Remember that they're going to be built on spar, so spirit representations, that's the key okay, so first property. If I took two STRs two representations and they have a common bit in common I mean they share the same bit being one I can say they have semantic similarity, they're sharing some semantic meaning. This will not happen by chance.

E

So if I took a couple of bits on like this I say, oh these guys are similar and here's how they're similar, because that's those bits represent something semantics. If I wanted to assign one of the remember a pattern and then say: hey these, the pattern occur again later. I store and compare operation like store this pattern and, let's see if it occurs again, we're not going to save all 2,000 pits, that's what we would do in a computer.

E

What we're going to do is just save the locations of the 1 bits, so I have a list of 40 indices and I said: ok, I'm 41, but let's remember where they are and now, if I see a new pattern coming in I, just look at those locations and I was thinking about. There's ones there I have the same pattern: that's pre, guaranteed by the way in a brain. These connections are these synapses between themselves.

E

Now, what if I couldn't store the locations of all the one bit I could only store the locations of a few of them. A subsample of let's say I said you can only store 10 of them. I picked randomly 10 of them. Well, we can do the same operation and a new pattern comes in I'll, look at those 10 locations and I'll see yeah, there's 10 ones. There I'll say it's the same pattern, but you might say way: that's that could be an error. What about the other 30? They could be different and that's true.

E

However, two things, first of all, that's extremely unlikely that that's going to happen. Dizzy's don't have my chance. If you do the statistics stream.

C

H

F

E

Also, even if you do make a mistake, you're making mistake for something: that's semantically, similar to the thing you store he's got a lot of semantic ostrich in common and therefore it good substitute- and so this is the basis of generalization in the brain. Is that we don't need to store the you know recognizes compatibly completely we can subsample. We can look at a partial pattern to say this is semantically explosive. And, finally, we there's a there's another property that we use a lot in our algorithms, and this is the most complicated one.

E

It's I can form a union of these things, I can or them together. So in this case, I may take ten of these guys each of two percent of its active if I order them together, literally just do that I'll end up with a new reputation, two thousand bits that has about twenty percent of exact. Now I can't undo this. You can't ask pay what with the original ten not possible, but you can do something almost as good. You could say: here's a new one.

E

Is it one of the original ten and I'm going to claim that if you just look and say well, is the Union have bits in the same one bins in the same location as the one I'm checking for I'm gonna say it's a fit? It's a good match. Now again, you could point out that this could be an error, because I could be matching someone from one of the original ten someone's from the other lieutenants. So on again by the same logic, this is very unlikely to occur.

E

Extremely unlikely I mean almost astronomically unlikely, but it could, and but if you do make a mistake or they don't want to line up, it's you till you're making mistake for something semantically similar to the things you support. So this is the kind of logic that the brain uses, and this is kind of logic that tells me she'd use, and these are kind of logic that we're using in our algorithms, okay, now just a couple of words about neurons and I promise. Well, this is the last picture, I. Think of a biologists thing.

E

This is a picture of a real nine. In a brain, this is you just 80% of the neurons in your brain and your cortex. Look like this. That's called the pyramidal stone. Now these cells have lots and lots of synapses on these are the connections to other cells and typically, there are 10,000 synapses on neuron. However, only a few hundred of them are close to the cell body and they nega the class. They define what you might consider the classic receptive field, the cell, if I'm a neuroscientist looks as and why does itah cell respond?

E

They look at those hundreds and they'll say your that defines it and all the other 9800 don't seem doing too much those other the vast majority. The synapses are on these distant connection, these distant dendrites, the dendrites of these trees that come out from the neuron and and what we now know is we didn't know 15-20 years ago.

E

If these these dendrites are active computing elements, they do something interesting and what they do is, if you have say, 10 to 15 to 20 connections on a dendrite, and these connections are close to each other, very close right next to each other. On the branch and they become active at the same time, then we generate generates what is called the dendritic spike and has a large effect on the cell body. If those synapses became active at different times or in different locations, nothing occurs and.

D

So what we can.

E

Think about the neuron is it's got all these coincidence detectors all around it's looking for a whole bunch of unique patterns and if it finds one of those patterns, it's going to do something interesting, you think of that as context. So these are the thousands of synapses.

E

What they will do is they'll have to recognize a pattern and then put the cell into a twits call it the polarized state, or we call it a predicted state, so the cell can say: look I'm going to be able to predict my own activity if I see one of these many patterns out there. This is a picture of our artificial neuron that we use in our simulations, and it captures this.

E

The green dots are the proximal or the synapses that are that are near the cell body and then the blue dots and this one are those ones on the far distance and those are the ones like coincidence, detector. So that's what we're showing here, and so, when we build these models, this is inherent in how our models work. Okay, now we're gonna tell you how the basics, how you learn transitions? I'm not going to go through all of it.

E

Imagine I have a bunch of cells in an array light, so those little cubes each one of those is one of those cells and they're all receiving some input and and they get different amounts of input from the input space. So you imagine when you have any input coming from your eyes. It's this big array of bits. It's not like one thing and each cell is getting a different amount of input. We won't talk about that and that's what the color represents.

E

Some of these getting more input than others, and what will happen is cell that gets the most input. Will fire first and inhibit the guys nearby and all the cells are trying to do this, and what will happen is when you have an input. You'll bite, you'll end up with a sparse representation.

E

You'll end up with just a few of the cells being active and most of the cells being inactive and there's a picture of a small section of one of my simulations, just showing you a few cells and what a sparse representation might look like in a brain or in a simulation. If you want to put a visualization on it now, this is a sparse cell activation or sparse representation.

E

Now this is like a one time. You have this pattern, but at another time you'll have a different pattern. Let me go back one forward back one: oh okay, this is what we were. This is what the praying has to learn sequences up when it's learning sequences, it's learning sequences of these sparse patterns and the way it does, that is pretty cool.

E

An individual cell, when it becomes active, looks for other cells nearby that were just active a moment before, and it forms connections to mom, one of its tendrils and so it'll, say it'll, be able to say, hey, I've seen this pattern before and when I see it again, I'm going to predict I'm going to become active Nick's and every cell is doing this, not nothing because they all become after some percentage of the time and.

D

Everybody is learning to recognize lots and.

E

Lots of pattern so when you.

D

Provide a new pattern.

E

To the system, what will happen is some of the cells become active? Those are the red cells in this picture, and some of them will be predicted. Those are the yellow cells in this picture that they're polarized now there's more yellows and reds in this case, because this typically, what will happen is I might learn many transitions.

E

So if I learned a followed by B and a followed by C and a followed by D and I show at a it's going to predict B, C and D the union of those patterns at the same time, that's what's typically going on now, there's a problem with this. This is a not a high order memory. This is a. It only could learn one state transition. So this is a first order, sequence memory.

E

It could not distinguish between a b c d and XB c why it doesn't have the ability to do that and the way that the brain solves is in the way our algorithm solves. The CLS office is going back to those mini columns. I mentioned earlier, and I have one slide on that, which is a little bit of a build here, so we want understand now, cut ourselves, learn higher-order sequences and how we gonna use mini comps for that so we're trying to solve the ABCD prediction, the exbc y-position.

E

So let's talk about before training here are some pictures I made that sort of give you a sense of what's going on? Each of these little boxes is representing a very, very teeny, little sort of caricature, of a slice of a cortical layer or layer cells. The dots are in mini coms.

E

You can see, there's like six cells in each column and there's a maybe a dozen columns in each of these little pictures and if spar so native to the letter A that input a is represented by three columns and then B is represented by three columns. C is another three columns with these and other becomes. These are sparse representations of the patterns. Abcd course we will be doing this in much larger number of cells, but this is to illustrate now, if I showed you the next pattern, X B CY.

E

Well, you see, X is different than a, but the B is the same as the B and the C is the same as that C and the Y is different than the T. So this isn't going to work. What happens after training is interesting. We start off with the same a, but when you go to the next pattern, this is after training, the sequence you end up with a new pattern: B B Prime, and what B prime has it's got?

E

The same columns active you can see that they're the same three columns, but now only activating an individual cell in each column. It's much sparser instead of having 18 cells, act and I have three in this picture, and the same thing would happen with C prime, and so the same thing would happen. Would defund so by after training, my learning sequence I'd go to a and a B prime, the C prime D prime. The columns are the same as before, but now we have different cellular representations.

E

If I did the same thing for the exbc wine sequence, I now labeling it be double Prime and C. Double Prime you'll see that B prime B Double prime and B all have the same columns or they have different cells, and this allows a system to represent C or B in any many different contexts. It allows us to say this is a seeing that it's uniquely in this sequence, and it allows us to predict D instead of Y.

E

This is the key to how the whole thing works and just to give you a flavor for this. If we had 40 active columns, not just 3 and there are 10 cells of a column, these are very small numbers from a brains point of view.

E

There would be 10 to the 40th ways to represent the same pattern, different context as I could learn, 10 to the 40th different ways of be representing, be in different sequences, and that's a very, very large number of course, and so the brain has this ability to learn almost almost an unlimited number of of contextual ways of representing the same thing in different contexts. All right. This is all about the CLA and I know it's a lot to observe and you can't absorb this in in one session and I hope. I didn't.

E

You know, lose you too much on that, but I wanted to give you a flavor for in the end, you have to take my word for it. The thing really works and it works. Well. It converts an input into a sparse to stupid representation and columns alerts transitions. These high order transitions and it's able to make predictions and detect anomalies. We use it in the brain and we can use it machine intelligence for inference higher inference, central difference and motor recall. That's how the how to tell you your brain.

E

Does it and it has some really nice capabilities. It's an online learning system, meaning it you don't back up the data, you just learn: does it go you throw in your data? I didn't tell you how this happened, but you can read about it. It's very high capacity, even a very small region of the CLA connect. It can learn millions of transitions. It works as simple learning rules, it's naturally fault tolerant. So no cell, no neuron, no synapse no column is essential and there are no sensitive parameters.

E

It's like this thing has to be really tweak to get it to work now for primitives. You can make it work differently in different ways, but it's nothing really sensitive. So these are great attributes, and these are the kind of things we'd like to see in a system like machine intelligence. Ok, I argue that this is the basic building block of machine intelligence. It's a basic building block in your cortex and there are people out there now trying to figure how to build this stuff in hardware.

E

Ok, so let me um how much make sure you should still know I'm here. Let me just talk briefly about now: I'm gonna switch to applications like does this stuff really work? What do you do with it? And what can you do so in this case, I'm gonna talk about how we visited on a company in for nominally detection and and then I'll talk about another application of natural language processing.

E

So we were trying to find a commercial application for for the sea line, and so we said, let's do it poor anomaly, detection and streaming metric data, so we can take a server and we can take a data point off that server every minute, every five minutes we combined it up a time. We run it through an encoder which turns it into a partial, stupid representation. We then feed that to the CLA. The CLA builds a model of how this metric or this value changes over time. And from that we get a prediction.

E

We can detect that there's a prediction error and we do some interesting, so statistical processing. On the other side which we could read about on a white paper. How much time would you end up with is an anomaly score? You can end up saying the state of the system. How unlikely is it given what I've seen in the recent past, given what I've learned about how this thing changes over time over the last to say three weeks or month? How unlikely is it these patterns I'm? Seeing now these are temporal patterns?

E

It's like listening to someone play music and saying hey: have they gotten better or worse so they're, making more mistakes that they change their style? Something like that. We can do this for lots of metrics and off the same machine, and we can put this in a product. We did so I'm gonna turn about our product I'm, not trying to sell you on a product just going to illustrate how it worked. The products called the rock we've. Initially, it's about a month old, been on the market for like come.

I

E

He's designed from the Amazon market pull the damn beans on AWS. This is their cloud services much the internet runs on, and so we run on top of AWS. And the point of this is that we get the data from a device through something called cloud watch we get data from the server's themselves. We feed this in the drop drop, goes models of this data and it does this in an automated way, because the whole thing is learning online continuously and then it sends the results of a mobile client, and this is not brain related.

E

But just to give you sense of how this whole thing works and what we do is we show you all the different instances, all the different servers, you're monitoring. We show you how anomalous they have been in the last week of the last day over the last hour. It's a very good, sorted so very quickly. You can look at something and say hey. If everything has everything working well, what's what's unusual and then you can drill down to see. What's going on, I need to tell you a little bit about the interface for this.

E

Just so you know why don't we show you some other pictures but you're interesting, so it can. We start off with on the left there sort of a sorted list of how anomalous your servers are, the on any particular server. You can drill down and see which metrics are involved and how anomalous they are. This is all done. Those on the gray are in the middle picture, those three different models. They are all a cortical model.

E

Every one of these things is running we're running hundreds of cortical slices on this data and then the final picture. Someone can actually look in to see what the actual metric data is and see why the grok determined it was unusual or not. So we sort this by anomaly score and it's continually to the learning and it's Kachinas and updating. Now here comes some of the interesting things so I want to give you a sense for the kind of sensitivities in the CLA. What did you see they bring to this?

E

Why didn't we use the CLA for this? Your son just give me some example here very sudden and obvious change in the data. The blue line again represents data coming off a server. This case is super utilization. How much did CPUs would being used and.

D

You can see croc, detecting.

E

When it was running along a one level and then jump to another one, that's pretty obvious. Here's a little bit less hobby, it's a slow chance. Those things were sort of creeping up slowly over a matter of days and eventually God says: that's enough! That's change! Here's something! This is a very subtle change. The two pictures on the right again we're talking about how the court of a model is modeling this data or saying the data into the court of a model and he's trying to make predictions about it.

E

In this case, the data is very predictable. It's a very regular pattern rock and did not detect those two spikes in the third picture to the right because it's seen them before. But if we zoom in on the big block of blue there, we go from looking at a week's worth of data to looking at the day's worth a minute. You'll see they caught croc detected a very subtle change in a very repeated pattern.

E

So normally is every hour, there's a little tick up and behavior here and one day an hour, and so they change slightly amount. So this is a very regular data set Brockville they're, very, very highly predictive model and it could detect when something is very subtly different. Okay, here's a case where we could predict the changes in a noisy data stream.

E

So in this case the CLA is not able to make very accurate predictions because it's spiking all over the place, but it still can know a predictive model and it can tell when that model is gotten worse or when things become less predictable and finally, I'm going to show you this last example, which is I, think really shows the power of this kind of tool and I don't know of any other way of doing this.

E

These these three pictures represent the same server a particular time, and the two left pictures of the three are two metrics one: CP utilization, one disk right, bytes and again here you see croc detected an anomaly in the data and if you look at the blue line, the data.

F

E

Obvious why croc detected that's that area? What was about it? Why was that less predictable at that point in time, and it's not clear the human wouldn't have been able to pick that out, but grant in a CLA said. There's something unusual about this. I know the patterns of this data. This is less I, couldn't predict this, it's less predictable than it was before, and it turns out in this particular case. These two models both detected a very sudden, a subtle change.

E

One of our engineers went on to a server that normally is an automated bill server. If you know what that means. Basically, it goes our product over and over again on an automated basis and he started a manual build process, and so definitely when a human did something slightly different because the human he was doing what is normally done automatically, and so this shows the power of our case we're using the claa in this cortical models for basically doing very sophisticated anomaly detection in streaming of data.

E

That's not, you might not think of that as a machine intelligence application, but that's the kind of thing we can do today and it's very powerful and it's very valuable. Okay, let's switch now and talk about a different application. This was from a company called set in in Austria and what they did is this guy named Francisco Weber. He read our papers about the CLA and it's partial, stupid representations and he said hey. This is this: is this solves a problem he's that he was a natural language expert?

E

And he said this is I've been working on these problems, representation in natural languages- and he says my gosh- these sparse distributed representations are the keys to solving natural language problems, so he did something very clever. Let's take a look at that. He first of all, he built a tool that builds part, two sparse distributed representations from of words and and proper nouns and so on, and he started with a hundred thousand now I think use up. I, don't remember, but he.

F

E

This automatically takes Wikipedia and he feeds it through a little parcel he's created and ends up producing these sparsa stupid representations, one for each word now.

E

Why do I call this parses two in repetition, whether in this case, there's 16,000 beds, they're sparse meeting, only a few, the bits of one and most of zeros, but he has the properties we talked about earlier, where bits have semantic meaning he doesn't assign him to learn, but a few representations to share bits, a sharing, some sort of semantic learning and need to try to tease out what that semantic. Meaning is, but it's not so easy, just like it is in the brain, but they are true artistic representations.

E

So now he did some very clever things with them. You can do something like the following: you take the the STR for the word Apple and the sparse is to be references for the word fruit and now they're, going to share some bits and they're gonna have some bits, they're different, so in some cases they're some bits that are some semantic, meaning that are common between those two.

E

If you take, we subtract out the bits that are an Apple that are in fruit, meaning we remove the fruit, miss from Apple you'll get another spar, sister representation, it's a subset of what the Apple representation was, and you can look that up. That's a new representation. We've never seen before it's a novel one, but it's going to be close and overlap with other ones, and so we can say what does it overlap most with and the answer you get is computer, so you take apple fruit. You get the best match you get is computer.

E

The next best matches are shown here, Macintosh, Microsoft, Mac and so on. There's a semantic processing done with SDRs, that's very, very cool and there's a lot of stuff. You can do with this at our last hackathon in the fall. Our VP of engineering is to put item on, did took this and use it with the CLA. So here's what he did. He said: okay, let's train the CLA on sequences of words like sentences very simple. We did three word sentences and the words the sentences he picked had this structure to them.

E

They began with an animal. The second word was either eats or likes, and the last word what the animal eats or likes. So an elephant eats leave an elephant likes water. He made up just a you, know 50 or 70 sentences like this. He fed it into the CLA and he trained it on these sequences and- and he said, okay, let's ask the celiac question. He fed in the pattern for Fox and eats now. The CLA had never been trained on the word. Fox, never he'd been trained on other animals, but not the word.

E

Fox is the first time I saw the word Fox butt-fucked obvious she's, going to have semantic similarity to other animals in the list, and so we could ask okay. What does this say? They predict the fox eats Gibbon. He knows what an elephant eats and what a cat heats and what a wolf eats. And what of you know, frog eats what would a fox eat and you get a prediction from the CLA and you can look up. The prediction say what is closest to and answer he got was rodent, which is really cool.

E

This is I, don't want to oversell this. This is you know, we're not. We haven't built the natural language machine, that's very capable yet, but we're doing it the way, the brain does it and exactly the way the brain Deborah, you were using the same type of representations, we're using the same type of memory systems were using the same type of predictions and I think this was a real, really really beautiful demonstration, both STRs and the CLA and and there's a lot of application for this.

E

So this review here this whole thing was done without supervision in exhibits, semantics generalization, both at the word level and at the grammatical level, and we think there's gonna, be a lot of interesting applications, commercial applications of this. Now, let me remind you, I'll just tell you that in this put two cases where a grok I talked about our using it for streaming analytics and SEP using for natural language processing, that's the exact same CLA code base. In fact, it's the exact same code.

E

We didn't, we didn't modify the code you get to work in these two different examples back to code was written to do either. One of these the code wasn't to emulate the generic or the universal process of how brains, learn, higher-order sequences and and we're able to apply to different problems with really very little effort. Well, it sounded like an make. It sound like it's from the lottery. Okay, so now I'm gonna switch to getting close to the end. Here we, our nupoc, open source project and I, just wanted to tell you a few minutes.

E

The guy on our end, who you can't see, bring me back up here. Sure Matt Taylor he's our these are guy, who runs a new big, open source project, and we started this last summer and it's it to be going very well. In there we have the source code for the cortical learning algorithm for the encoders support libraries. You should know that this is a single source tree, meaning our code that we use in croc is in this repository. So when we make a new update to the croc algorithms, they were there right away.

E

We have an active and growing community, which is great. We've run two hackathons we're going to do a third one. What is the third one that.

B

E

Third and fourth, that's gonna, be here in San, Jose California, and if we have enough interest- and we can somehow manage to do it- maybe we'll do one with with the person tall guys in New, York will say, but and you can find, that at Numenta org I don't look mad. If you want to add a few more comments about pic at this time or.

B

You want to come around here just now. Let me just show them.

B

There we go so hopefully you can hear me, but this is the mid-to that work and that we've got a community of 92 contributors now and it's it's growing very quickly. We've got over 700 people on our mailing lists. So if you're interested in the theory that Jeff is talking about, we have a mailing list. For theory. We've got a discussion list. If you want to try and get new picked working for yourself and then you know a list for us that are trying to develop new BIC.

B

We've also got a wiki with a nice path. If you want to learn more about how to use, do pick, how to get things started more stuff about the theory. There's a Jeff mentioned a video about sensory motor integration. So there's a link there, so please feel free to go to admit that I work and there's a link to our wiki from there.

E

I, hopefully, could all hear that and that's doing a great job of running the new Pig project. um I just taught you go to the next slide here which are goals for 2014 and we have some research goals. We want to finish implementing the lay of four sensory motor inference. We might we can get back to introducing hierarchy. I would go to publish some in peer-reviewed papers. We haven't done enough of that. So you have to do that new pic we're going to grow the community.

E

We we have some partners and commercial partners like Sept I mentioned we're also working with IBM and some others, and so we have to support them and so a lot of a lot of cool stuff going on a new pic. A lot of projects talk to Matt about that and then we're trying to create some. You know show this commercial value for the CLA with our crop product and that will help attract the developers have to help attract commercial dollars and so on and we're also looking at new application.

E

Ares there's a lot going on here. You know I, you know this is really like. The beginning of this is I feel like a little bit like when the nineteen forties. If you don't know, the oh well come down a second here: it's not that I'm appointed.

E

Why don't I say like the 1940s, because in you know, we entered the 1940s people had the theory about computers that Turing had written his seminal papers in 1935, but really haven't built any commercial computers yet and when we left in 1948 to 1950, we were actually the computing industry was going. I feel that's a lot like where we are right now where we are getting.

E

These theories were understanding how the brain works, we're starting to build machines that work on them and we're starting to just show commercial value, and- and you know, but it's hard I won't. You know. I won't be around a bush about this there's a lot of new concepts here. A lot of challenging things understand. They'll take take a while to really deeply understand how the CLI works. You can get there. Trust me it's beautiful when you get the whole thing. It's not that hard, but.

F

You know it's not like it's.

E

Not so simple we're on the forefront of what's going to be many decades of advances in machine intelligence, and this is really the formative years. So this is summary my talk here, but I covered so far, I argued that the cortex and your cortex is this close to universal learning machines. We can imagine and therefore machine intelligence will be on the principles of the neocortex and not some other principles.

E

We need to understand these principles and you can't again it's not to build human-like machines or machines that are going to be your body and going to talk to build machines that learn on these principles that can do them sensory motor models or the world and so on. Next is we have an overall theory on overarching theory about how it's going on ath TM, the hierarchical temporal memory theory we know in detail one particular building block.

E

That's the cortical learning algorithm we've been exploring that and testing on extensively for years and and we've chosen two near-term applications in anomaly: detection, I didn't talk about prediction and natural language processing, it's very hard to know where this is going to go in a long time, but that's what we're doing right now and I've invited you to participate. You can go to the mint org and there's a bunch of papers and lots more talks by me and other people.

E

If you care to listen to it, so I think I'm going to end there and then we're going to do our Q&A. Thank you.

E

I can't hear anything in that, so thank you.

E

All right so I think Allie you've got the mic, you're gonna pass the rounds and hopefully I'll be able to know. What's going on.

A

Thank you very much. Jeff we I think everybody was a very concentrated I. Think everybody really enjoyed that. So you know we'll take questions now, so anybody just raise your hand if you have a question.

G

Hello, my name's Jack Kelly I'm, a computer science PhD student at Imperial, but before that I did an undergraduate in neuroscience. So I'm, you know what you're talking about here is is extremely exciting and you know a lot of machine learning is quite dry and this is really sexy. It's cool, so I just wanted to ask that. Can the CLA do classification, like, in terms of you know a lot of kind of conventional machine learning you showed in the image and it says that's a cat yeah.

F

E

Yeah is that the okay I can't tell who danced all right, yeah.

G

E

Yes, you can do classification and there's lots of types of classification. Let me let me try to answer two questions. Let me talk about classification in a general sense and then just talk about like recognizing an image of a cat.

E

Okay, essentially at any point in time, the CLA, the state of the CLA which cells are active in which tells me to predict the state but really which cells are active, is the sum total of all knowledge that the ceiling has about the world in the past in the past, in the current context of the president.

E

So it's an all knowledge applied to the current interest rate, and so that's far as pattern, you have coming out that the activation pattern which cells are active is is a fairly unique State at any point in time, and- and you can classify it now in the brain- that's classified by just feeding it associatively to a bunch of other cells, but you can literally just sing it in a classic classifier. We've done this extensively.

E

You can take any kind of any favorite classify you have and nearest neighbor, or you know whatever you want, and and we've done a bunch of these and you can classify that state and and it works really well now, that's assuming you've had the right input and you've trained the system properly, and so the general play the general answer is yes, you can do classification, we've also done cost went with it, I don't know if we've described that anywhere, but we could talk to you about that. If you wanted to now.

E

The question was like: oh, that can't recognize the image of a cat. Well, we haven't done anything with vision, yeah doctor when first created the CLA, we started working on vision and we abandon it for two reasons. One is that in the brain, a huge amount of your neocortex is assigned division. It's something like 40% of the neocortex is only vision at about 60% is primarily vision turned out that you understand.

E

What's going on that takes a lot, a lot of memory to get human level visual performance, and we are finding even in very simple problems, that our simulations were slow and so so the long story was we just wouldn't be able to do these simulations the way the brain does it and we also had some mistakes back then this is like four or five years ago. We think we could do a better job now, but we haven't.

E

We haven't acted that again, yet we're still kind of fearful of the of the you know the amount of memory and the resources would be required. We would go about it in a very, very different way than other people would now. You know the probably where some of the advances have been made recently and deep learning. Deep learning is just a hierarchical, artificial, neural network, and so they've got some of the same principles that we talk about in HTM and I.

E

Think these two fields have moved together, but but the blurring of this jump right internet way, they saw that they use no time whatsoever. There's no time element to it. But that's not how humans learn, we learned through time, and so we know how to do that. We think, but we're not really ready to computation. We had.

E

We don't have the hardware to be able to do those simulations that just too complex they just take too long to run so we've chosen to work on other problems which we can do today in software and do well I'll make.

I

An observation just.

E

For you don't be surprised by this I mentioned that the human neocortex about 60%, of the it's dead, it's working primarily on visual problems. 42 almost exclusively the areas associated with language are teeny compared to that they're very small, and the evidence suggests that language takes is a much easier problem than vision. It takes certainly a lot less resources. Now we could debate the intricacies of this, but you know once you get.

D

You know I think.

E

We can do a lot more interesting stuff in language processing. We can do envision, giving the software constraints we have today, so I think we know how to do like recognize we're not we're doing the way the brain would do it a little bit less than the way like the deep learning guys are doing it, but we're not really there yet from a simulation point of view, but we can do classification on lots of other problems and that works really well. I can't see your eyes. I have no idea. If I answered your question.

D

I'm Jon Drummond I worked for a spread betting campaign. Research I was just wondering uh some of you already answered was again having seen success, Hinton and others have had with deep learning whether you could expand a bit more on sort of similarities and differences. In the way you were working, yeah.

E

D

E

Friends of a lot of people, the deep learning world I, don't view. This is not an adversarial thing. This is a we're all just trying to achieve the same result.

E

My approach, our coach, is starting from the biology starting from the neuroscience most. The deep boiling people are not taking that approach. They're, sorry for more mathematical premises, but I mentioned earlier now. We both believe in hierarchy. It's all about hiring and we're not there. Yet, because you know we decided the model, the individual layers and understand how the processes are going, were they when Michael hierarchy, but most deep learning.

E

People will admit that what the big thing they're missing is time that they have no concept of time in most deep learning and commonness, and they can't possibly model understand what a saccade does or how things move through time and there's some talk about that. There's some some primitive attempts, really nothing inherently going on a time space, so they know that they have to move in the time direction. I know that we have to move more into the hierarchy direction.

E

I also know we have to introduce motor behavior, which they have no concept about, and we're working on that so I see these two fields. They should be merging together. They're, not they're, not really contrary approaches they're they they were actually working on different aspects of the same problem, they're focusing more on non time: hierarchy, I'm, focusing more on time and motor behavior, where knowing we have to. We have to reintroduce the hierarchy so we'll.

F

See how it plays out I think it's all good, as.

E

Long as we don't get stuck in some some local minimum, which happened in the past, with AI and with with early artificial neural networks, we have to keep. We have to reintroduce time into these hierarchical models. We have to introduce behavior into the entire article models and we have to introduce your attention and and when we do that, then we'll really be will achieve something.

J

Hello there, my name is Antonio, it's the second time. I have the honor to speak with you, the first that one was like three years ago and we had a meeting right there in your office, oh yeah, so even back, then I was convinced that they were on the right direction. Toward you know biological intelligence, so I asked you.

J

How could I be helpful and you say that I should become a programmer, because you were trying to build the product, then the gerak project, so it thought this wasn't really exciting, so I try to explore the theoretical foundation of your work and I would like like to summarize like with your fearing like ten words and I would put it this way like we live in space-time. So we use time to understand space and I think that this is like the core of the universality we are looking for in terms of the learning system.

J

Neocortex is.

J

So I mean we can describe everything we can learn and we can divide everything can learn into a spatial and the temporal component, so even whatever we can say, do or learning's does this core this spatial temporal core and I think this is what gives the neocortex universality I think.

E

You're right and I agree with that, and you know it's still mind-boggling to me when I think about how the cortex really doesn't know about sight and doesn't know about hearing, and it doesn't know about touch. That's.

D

For other parts.

E

Of the brain, but that's true, that is true for the cortex and the way the cortex handles vision is exactly the same way as handles hearing and touch it's it's the same thing, and so, knowing that you have to get to those universal principles, is at first daunting, but it's also very enlightening when you start figuring out what.

G

E

Are and and I did talk about spatial patterns much here, but if you actually get into the you read the white paper on the CNA, we talked about the spatial polar in the temple, polar at times, sequence, memory and so on, but I think you're right I mean we're looking for these universal properties. What is patient time?

E

What I heard when arguing I still believe is true is that time has been the one, the most largely ignored component of AI and artificial neural networks, and that this is the thing and the reason people ignored it, because they focus on spatial vision, problems they said well, look. We can recognize a pattern picker in flashing in front of your eyes. There's no time involved. So, let's just take time out of the picture, and that was a big mistake, because time turns out to be the most important part of the whole memory.

E

I I argue about. 90% of the memory in the cortex is time based transition memory, not 10% would be a spatial moment, and so time is a critical component, but pairing. The two together is really the power level system and they are universal. I didn't I, didn't talk about speculate about the future here, but there was a Brazilian types of problems that we don't even think about today. That could be. We could be dealt with using these universal principles.

J

One last sentence, however: powerful these, like a universality, might sound I think it still has like a fundamental limitation. I guess it is a bit far-fetched, but the theory you describe an irony recorded as well relies on the separation of space and time and if we think about like the theory of relativity space and time are ultimately connected in a sort of way. So perhaps one day we will discover an even like more general theory of intelligence. Maybe.

E

But I'd be happy that I'd be happy to meet and do do I really I.

J

E

Start with the mouse, you know, I didn't tell you this, but the the sea-waves we implemented today, which is 2048 columns and 64 thousand neurons, that's for every model we built. This is pretty much that these days, that's about the thousands of the size of a mouse cortex, it's about the millions of size of a human cortex and one one-thousandth size of a mouse. Cortex doesn't sound very big, but it's actually really powerful that little slice. So we got a long I'm pointing out is we can do a lot with a little.

E

We have a long way to go before we start bumping into Einstein.

J

Thanks Allah, yes, I'm talking again I tell you I.

H

I'm greg I'm just programming with a interest in neural networks. I just kind of wanted to know how the system compares to the common sort of pitfalls of of sort of general machine learning. Things such as overfitting such as local minima does, and it's one of my questions is also: does the CLA rely on the vast space to avoid overfitting such that you can?

H

It can continually train itself, but because of the amount of space that it's using to represent the sparsity of the of the space is it is it is that how it avoids overfitting or some other fundamental.

E

There's a lot of concepts in that in that question. It's a great question. So, let's try to tease it apart a little bit. Let me just tell you a little more about how the ceiling learns. So, first of all, it has very large capacity. So you know we can learn on millions of transitions in a small section of it, but how does it fail and and what happens when you continue to the overtraining? Well, the way we implement? It is what we call fixed resources, so we're not.

E

We don't increase the size and the number of synapses and the number of cells with a number of so it's like a brain is pretty relatively fixed and when you train it more and more and more or two things can happen, you can set it up so that it forgets which it does anyway, but you can make it you can make the ratio learning forgetting you can change that. That's one of the parameters of the system, so how much do I want to bias toward previous learning?

E

How much I want to buy us or new learning and it'll forget older things and one of the ways it fails? It can forget patterns at seeing in the past. If it's you know enough stuff. Since then, they'll say I.

E

Remember this: if you don't have it forget what it'll start doing is I sorry for generalizing it'll start seeing patterns like likes like two melodies or two sequences, know start saying well, I'm going to treat them as the same they're similar enough that I can't extinguish it between them anymore and say you do you know, there's no magic system in machine learning, there's no perfect system that fits all problems but to Sealy has its own set of limitations and there's these are the kind of ones we run into you run into like choosing how much you biased or the past, both the future or how much you want to generalize, or you remember things uniquely and but generally the failures you see in the senior.

E

They are almost always failures of generalization. They never fail, it never fails catastrophically and they were like up too much and I start getting garbage results. Well, you start giving as you start over generalizing. It starts predicting things more in a more generic way, and so, let's you know you be able to not detect very, very subtle changes in it. So I don't know if, like your question, was kind of open-ended there's a lot to it.

E

I just want to say, like it's, not magic, I'm, not claiming it is, and but it's it's got a lot of nice properties to it and, and you have some control over them, we generally do not when we're using it in our product or emergence and our tests and so on. We generally don't we're not doing a lot of those issues of overfitting and things like that.

E

We mostly would look and find the bugs, and we will have more problems, getting right, data and figuring out some other parameters and things like that, but because the system so fails nicely and therefore even the heaviness of failure sometimes hard to find, because it feels nicely so I hope, I answer that a little bit it's a it's. A very deep topic and we'd have to sort of get into lots of details to talk about specific type of problems.

F

Hi Jeff said uh Peter Morgan here, I have a little bit of a left field, question more of a business-related one. When you've been going a while, you know I love your product, I'd like to see it accelerated to market. You know you mentioned you're working with IBM. Has anyone approached you like Google or Facebook to kind of work with you.

E

Well, the conversations of companies you talk about all of them, so yeah.

D

C

E

A lot of big companies interested in machine intelligence and- and there are different opinions about how to go about it- we represent one end of the spectrum of approaches. I tend to think it's going to be the one that carries the weight in the end of the day, but I love the sea. So you know how we had lots of discussions with people yeah, there's a there's. A talk. Online I gave a google Tech Talk.

E

Last year, I talked to involve the senior people at Google about these approaches when we talked about the relative merits and so on. So there's a lot of there's a lot of interest in different fields. I mentioned IBM because we could talk about them, but I can't mention other people that we're talking to. But there are lots of lots of interest in this there's a program being put together at DARPA, which is the United States Defense Advanced Research Projects administration. It's built largely around work. These are people interested feeling harder implementations of these algorithms.

E

So there's a lot going on. It's fun, it's a very noisy field, but at the moment and you'll you know you see a lot of companies claiming different things and making lots of different investments. Google bought some British company just last year and who in Qualcomm bought another one. You know who knows we're kind of our approach is just keep our heads down.

E

Do the right thing, you know have a very long term view of what we're doing and I'm fairly confident that will have a positive effect on the field over time and and companies that are interested in this wall or figure that out.

I

By Jeff, it's Peter Newman, yeah, I'm, part of artificial learning that complete rying to build electronics, specifically for machine they're, actually working on the other side of the coin.

I

But two questions.

I

First, is: is the CL a deterministic algorithm or is it a stochastic.

I

System and the other ways, how do you see hardware helping on the efficiency yeah.

E

Alright great questions, so yes, there is deterministic and if we and we test me, we use that in our some of our tests. So if there are a bunch of random initializations to it and you start the same random sieves and and you under a control, environment, you'll get the exact same answers and, from a practical point of view, it's not always deterministic, because if I actually run on a practical machine, someplace timing delays could change things and and result to end up different.

E

But if you take, I can transfer the knowledge from one model to another, no problem. I can under the right environments it's deterministic from a practical point of view by if I take some data off of a server and I ran it through draw and I took it the same data officer and ran through bracha day later. I probably get slightly different results, because the data comes in slightly different orders and the servers have different queues and all these kind of weird stuff going on it's too hard to figure out.

E

But so hopefully I've answered that question. There's there's nothing other than there are some random seeds for determining you know initial states of connections and things like that and if you process in the right order, you'll be deterministic.

E

So if you ran on your laptop and you ran the same experiment twice, you get the exact same results now on the hardware side, this is a big and how we can. How can things be accelerate here? I I, don't know if the guy don't know this, but there's a bunch of companies right now we're trying to figure out very large companies trying to figure out what is the next. You know substrate for computing in the next.

E

You know decades and all looking at machine learning, algorithms and they're all trying to figure out neural models and a lot of more interested in what we're doing and, and they have very different approaches. There's a guy I know at Sandia National Labs, which is a United States, National Laboratory and they are he's aged in doing photonics and that is on chip. Photonics right, they're, trying to you know, use light and guides, and so on. Other people trying use different memory types of things.

E

The key answer your question specifically is: the bottleneck is essentially a memory and memory transfer right the there's a lot of memory. We have a lot of connectivity. The bits have to get places. If you look at a human brain, the white matter which is the can eat, is the wiring. If you will that's the the big volume of the brain is white men. It's wiring.

D

It's like the old Craig computer in the.

E

Back you've seen the back like tray computers like this tons of wires. Well, that's what the brain is like, and this is the problem because you know on it's: basically a memory architecture which is distributed, and, and how do you build the distributed memory, architecture with lots of memory and I'm, not an expert in this field, but in the end, what's going to happen, people are trying to figure out what is the memory architecture that will work best? Is it various bus structures?

E

Is it's a distributed CPU and distributed memory control lots of different approaches there? What we hope to accomplish, essentially what we the need.

D

For, however, saw.

E

Several problems: it's just like regular computers, there's a need to make them faster. We went up into that all the time now. We'd love the our model to be much faster. We there's a need to make them go embedded so like what kind of stuff we're doing with grot you. You know that takes you know it's apparently hefty model and what, if I, want to embed that in every you know this driver every. You know I own box with controller every everything on the Internet right and refrigerate, or something like that.

E

Well, that's pretty heavy thing around there, so we want to go towards embedded things and low-power. So power, size and speed are all going to be critical here in the same way that we've struggled with all those in computers over decades. We're going to be doing the same thing here and I haven't a clue which architectures were going to win out at this point in time. I don't know.

C

Well looks under pranic here, I'm doing computer science and currently working there. The data scientists I have a bit practical question um recently got interested in nature. Language processing and you showed amazing example of how neural networks might be used, and your development and I recently been looking through publications by Thomas Michael oak he's from Google, and he did also using neural networks on different to yours sure, but similar problem. It's a ward representation in vector space and what you did is strikingly similar and what I found very different and amazing.

C

It's your example of folks eating rodent and you said that this network hasn't been trained on Fox, and this is actually question if you can ask for it how you've been able to present it into network people has a nose award because here in all of existing research, if we didn't train this reward, we don't know its representation in Metro space yeah. How did your conversation so.

E

Let me just walk you through it again, so the company staffed, which is in Austria they they came up with this, and you can talk to them to read about it. They came up a way of creating these sparse representations for words. So, although the the CLA did not who's ever trained on that word, the there was training involved in thinking how to represent the word Fox. So there was a representation from the word Fox and that representation was not an arbitrary representation. That representation was a sparse, distributed representation where the bits meant something.

E

So if you compared Fox to all the other animals, so I took the representation for all the other animals that that that are in this dictionary, and you would see it share semantic bits with all of them in different ways, so it the representation for Fox and substance encodes, how it's similar to other animals, and so literally, when you feed that pattern into the CLA. Even though he's never seen the word Fox before it seems some of the structure of Fox, it seemed the bits that our honor, the word Fox, have been on.

E

In other representations that it has been trained on, it goes back to that first property of SGR is the two patterns that are that share bits have semantic similarity. So by the trick of that demo, which blew me away the first time I saw it was that the representation for Fox, although it's unique, it also, is overlapping in bits with other animal representations, and so the CLA picks.

D

Up those similarities as oh yeah, why I've.

E

Never seen this exact pattern, but they're part of that are very similar to the ones I've seen before. They're the same bits around so the things that a fox like that shared with I, don't know IOT or whatever, who knows were on so the trick to making that work was that the representations themselves encoded the semantic meaning of the word and and the CLA was just generalizing to something similar to it.

E

Seen before it's as well as Fox, as closest things I've seen before this in this way and I don't know if I may answer that I don't know if you understood that answer, but did you have me satisfied with the bit? The trick was in the representations themselves? It's not an arbitrary representation. Fox was already similar to other things that it seemed, but that was cool and it's alright cuz, because Sam figured out how to do that.

E

Just to make that word dictionary.

A

Whoo one final question:.

A

No okay, so then you know I think we all owe Jeff another big hand here.

E

Thank you, hey just I, don't know if there's any more procedural things here, but I do want to thank percentile and Ally for helping arrange this and the guys at skills matter and Matt Taylor from our end, who put this together and again, I apologize not being in person, but maybe we'll have some future events that we will be over there and and I fish. You guys all coming out on a I guess it's evening for you and spend the time and I'm very excited and happy were able to come so.

A

Thanks much appreciate it and there's a I know: there's some other plans in the pipeline and you know engaging more with with no mentor and maybe running a hackathon or something around you pick or doing something in that space. So you know I'm sure there'll be follow-on. So thank you again.