Numenta Numenta Research Meetings, 22 Jun 2020

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Jeff Hawkins on Grid Cells / Subutai Ahmad on Meta Learning - June 22, 2020

Description

In the first part of the meeting Jeff discusses grid cells formed via oscillatory systems, the Bush & Burgess’ model of ring attractors, and how this idea might be overlaid onto cortical columns.

Starting at 34:00 Subutai switches gears quite a bit, and discusses a new paradigm for achieving AGI via meta-meta learning, by reviewing Jeff Clune’s 2019 paper “AI-GAs: AI-generating algorithms, an alternate paradigm for producing general artificial intelligence” (https://arxiv.org/abs/1905.10985). We discuss the prospects for meta-learning AGI, and meta-learning Numenta’s neuroscience based approach.

A

All right, we are recording, so I, just I've been working on sort of a basic task of really trying to understand in detail how how a metric space like grid cells work and how they're implemented in the cortex, specifically like the real anatomy of it. How would how would you create these things and I? It's I haven't been able to focus as much time on preparing anything about if I think I've made some interesting observations, so I just thought today, I would in frustrated of them.

A

You know working on the book, so much I haven't really been able to present this material. So what I thought I would do today is very briefly just explain what I'm working on and not my my results. So this is just a sort of a toe in the water say: here's the thing I'm working on and if that's of interest, maybe something a job with me later. But hopefully you know after a couple weeks from the when the book is out, I'll be able to put together.

A

I've been writing this up, but I'm not gonna, go through my write up here. So in that context, just so like here's, what I'm working on when I'm, not writing the book, so I showed this figure last time. This was a nice illustration of how the basic idea how a grid cell is created using the oscillatory inference model. So hopefully everyone will remember this, which the basic idea is. You need two oscillators. You need a baseline, an oscillator which is in the red you. That's your data, your baseline theta.

A

You need another oscillator, which is theta, plus a small Delta. The Delta is determined by the motion of the animal or in the case of the cortex, the motion of the sensor.

A

The motion, whatever is moving in a particular direction, so the faster it's moving in a particular direction: the higher the frequency of the green oscillator, but there's only a small difference between them, and so they run in and out of phase which you can see in this Purple Line, and so if a rat was on- and this was a grid cell or yeah on one dimensional track that this cell would fire on and off with fire and at various points in time like here and here- and here is a rat moves along the track.

A

So that's the basic idea behind.

A

Almost all the oscillatory interference models work on this basic idea, but this is only one cell. It does not map the entire array. It doesn't talk about 2d, it just a whole bunch of questions that go on top of us, but this model also is highly recommended because it illustrates the precession idea, where cell fires later as its approach as approaching its peak and then fires earlier in the theta cycle as it recedes from it.

A

So the next thing you want to do is you, if you want to make a, if you want to get a set of cells that fire in sequence, like a grid cell module like a 1d grid cell module, then you need a set of oscillators.

A

Each at a different, so you, instead of having one green guy, you have to have multiple green guys here, each one out of different phase, and so then, if you did, that you'd have one cell firing here and also farm here, and that's all fine here, another cell phone here, and so this has been postulated as a ring attractor. So I took this picture from the paper we reviewed a while back the hybrid model. This was from the bush and Burgess hydras hybrid model paper.

A

Unfortunately, this is I find this a confusing drawing, but it does illustrate the issue they're here, they're proposing, there's multiple. This is a ring oscillator here. So these all these cells are firing at the same frequency, but they're slightly in and out of phase. So as they go, the fate the peak of their firing travels in a circle like this, but they're all at a voltage-controlled frequency, depending on movement in particular direction.

A

So if I just had these cells here, these were these were sorry about that these cells here, if I looked at when they peak relative to the to the face theta, these would implement a 1d grid, saw module in this direction of movement, and these would represent another one decrease the module in this direction them and so on. So these are a bunch of 1d grid cell modules. If you compare these people cells with the base theta and then, if you wanted to get a two-dimensional grid cell, that would be this log roll cells.

A

Here you have to look at more than one of these: a 1d module. In fact, as is when foreign here we talked about how you only need two of these really to get a 2d a 2d tessellation.

A

But if you want a really nice one, then you should get a bunch of them, so here they're showing six of them at 60 degree angles, and so, if you had a nice perfect thing like this, this cell, it was looking at these grid. These. These are like one deep, rich all modules and if you and then this cell here would be basically saying, oh I'm, gonna I would create a 2d grid. So that's one 2d grid zone. So the question I've been asking.

B

In this in this thing, and that one in that model, are they all going at the same frequency? No.

A

No, they would all be going at different frequencies based on the movement in this particular direction. So every cell has thought of like a base theta and then the theta increases a little bit depending on how fast you're moving in this particular direction. So then the frequency with the animals- oh, it was moving very rapidly in this direction. Then, if this would be the fastest frequency, this wouldn't be increased a little bit because it's moving a little bit. Imagine is going from left to right.

A

Well, in this case, it would be a little bit incrementing here. It's not clear what would happen here, because these oscillations don't slow down. They only go. They only go faster, worried about that, but anyway, either either this one's discounted or it runs at theta base theta in this case. So but there are the ones where you have a positive movement in this direction. They would be running faster than the other ones.

B

So these are so. This is the model where you have a bunch of 1d RIT, sub modules, but they're all at different angles. Yeah.

A

Yeah yeah again I find this figure.

A

It's conflating about or combining a bunch of things at once, which made it difficult for me to understand it additionally, and so it's not ideal for me, but but basically, if these were cells and they're and and then you can look at and these cells are all peeking at within one theta phase cycle and you compared it to a base theta, which is not shown in this picture, then one of these cells in the ring would be in phase at any point in time and over and over again, and then it would slowly shift to the next one and slowly shift to the next one see these are if I, if I had, if I had a cell, it's not really sunny.

A

If I had a cell that basically combined this red dot plus a theta, then that cell would be a 1d grid cell, and then this would be a one Baker. It's on module, they're skipping right ahead to combining them into a 2d grid, so margin I.

C

Have a question, so this assumes that all these things at slightly different phases all arriving simultaneously so you're, depending upon the phase differences, simply being a temporal difference, but it could also be a spatial difference if they're delay due to distance that could yeah.

A

Although theta is pretty slow right, let's say it's: eight Hertz, it's 120 milliseconds per.

C

But it's not necessarily propagation time. If there's you know, if there's for some reason, there is some intermediary that basically.

A

Yeah by way of fire, yeah I think that's right, I, I'm, I, don't think that's likely going to turn out to be the case Kevin, but it's possible I, don't think I, don't think these models, don't don't account for any kind of intermediary or propagation delay, or anything like that and I'm. Not thinking about that either I think it's all local enough and it's a slow enough, Minh less. We this we're going to insert extra inch of media ourselves in it. For some reason we I hope the patient I'm. Not thinking about that, you.

C

A

Possible I mean.

C

The reason I mentioned is: is that the so the timing of these things to all kind of you know maintain a phase respect to each other is a difficult ask, but if there is a physical spatial reason for the phase well,.

A

That's okay, yes, actually I agree with you, I mean what the question I'm going to ask is: where are the ring attract within a coracle cop? Okay, that's the question I'm getting at, and so there has to be a physical mechanism to make cells. Do this right. So if that's what you're saying I'm with you right I mean in essence yeah the these. These models are proposed. I, don't believe they try to map this on to the biology, in any sense, they're.

A

Just saying this is a theoretical model and I'm trying to map it onto the biology, and then you get this physical structure that has to implement. If that's what you're asking then you're actually right, that's that's! What I'm, trying to figure out like I've come to believe that this basic model of ring attractors is correct and there's a lot of things which say that this is problem there right right right thing, but how does it actually implement in the neuron?

A

Now you can look at the physical structure of the cortex and say well, where would this be? The first thing I want to point out about that is I, don't think there are a bunch of cells in a ring. The cortex doesn't look like that, and some very simple animals.

A

You know there are some sort of you know, insects and so on. There's some ring like cell structures, but nothing like this in the cortex. My first assumption that these are not in a ring that they're actually a linear array of cells and that the phase progression is moving along in a linear direction. So that's what I write down here bring attractors are most likely linear, arrays, not Rin. That's the first thing. It seems if there's gonna be ring attract roads, they don't the rings.

A

They can be just a line of cells that that are marching in that are slightly delayed and their peak. That.

C

Would they would map to the idea of waves moving across? Yes,.

A

Exactly exactly why so, let's get down to the next thing, we're so then I ask myself: ok, I'm going with this idea, these ring attractors. This makes a lot of sense and I mentioned earlier. How you know many columns, look like they're voltage, modulated movement, vectors and so on. So I got all the right information there. So where are these and I basically won't say I think I've been to pursue two basic ideas.

A

Unfortunately, almost all these knowledge, we have got the details, structure, the cortex and most of the knowledge or the most detailed knowledge is in v1, and so we into the question we have to ask is v1 typical of other cortical areas or not so one of the things that v1 has is it has these?

A

We know it has these orientation preferences and we used to say their orientation column, but that's not really true, their orientation slabs, and so, if you move in one direction, you see the cells respond to orientations that are changing, but if you move in the other direction they're, basically it's a the common orientation- and this it's often drawn like this, but we know that these these things are kind of messy. They kind of come to these points of pinwheels and so on. But but this isn't.

A

This was a very well established idea that you have in being one. If you move your probe across in one direction and the column you'll find changing orientations which, to me are changing movement, vectors, they're, saying different, different directions. This thing can be moving and then in this direction you have a commonality and they really it's. It's not clear.

A

What's going on in this direction and of course there are many columns throughout this I'll get to that in a second, so one idea I've been pursuing, and this is not the one I'm currently favoring, but it's just I put it in there for completeness assume. One simple idea is that the ring could be a set of cells in a layer in a mini column.

A

So, like you could say, like take layer 3 in maybe 10 15 cells in that layer, yeah and the and the progression of the phase could be a physically physically moving from one and down vertically across that layer, so the cells are operating in.

A

So this all the cells in this mini-com are basically representing the same movement vector and then, if I can get the cells to fire at slightly different phases, then, and as long as it there's enough cells here to cover the whole, you know a whole cycle, then you would have a ring attractor. So that was one idea and it was like. Oh that's, really convenient because look here, column I've been arguing mini column as a movement vector and I.

A

Have this a bunch of cells net and they can just progressed like this, and it's not every selling me comments, it's just it's just even like one layer, so that was really nice and this would result in many 1d grid, so modules. Essentially the cells, every mini column would be a 1d grid. Someone and there's a lot of advantages.

A

Having a lot of mock original modulus makes it very high dimensional spaces, the other, the other opportunity I pursued, and this one I mentioned when flooring was here- is that you, your your ring, is not vertical, but Yuri movement is is across the orientation slab. So in this case each mini column is in essence a phase shifted element of the ring, and so each each slab is then each slab becomes a ring attractor, and so you have a progression of fades moving in one direction like this.

A

This this seems a really odd idea that can happen, and why would it happen that way and so on yeah? That means I'm is like going in one direction: I get changing phase and you know in the other direction, I get changing or you know, orientation or changing movement vectors. What's going to force that to happen, but I've come to actually like this idea. A lot I think it's there's some real things and I'm not gonna, go through.

A

Why at the moment, but one thing you do: is you give up a lot of you end up with many fewer Witzel 1d crystal modules, but you end up with now. You have the idea possibility of using the pyramidal cells in each units the air identify to sort of temporal memory like context so now I could select. You know if there's 10,000 layer, 3 or layer 5, for example, I, could selectively activate one or two to basically pick pick at a context.

A

Layer I can see with memory works, so this turned out I, just it creams, I've cringed at first thinking. This could really be true because there's a very complicated system, which is very little for, but it's starting to look pretty good, actually and and and I'm I'm trying to walk way through the different issues they're associated with it, but I think the basic idea is that, well, we know there's going to be grid cells, I'm, really convinced that the oscillatory infant, your interference models, way to go. That leads to essentially someplace.

A

You have to have 1d sort of good cell modules, which can be confirmed into to keep it so modules. You need to ring oscillators to do that. Where are the ring oscillators there's only so many places you can have ring oscillators one ways you can do it vertically the other ways you can do it. You know laterally and you know, assuming that there aren't actually rings anywhere in the cortex and then then you can sort of tease apart the different attributes.

A

What happens when you do it one way versus the other way and I have a lot of notes on this, but I am structured yet so that's the basic idea of what I'm working on right now. I haven't reached a conclusion, but right now this one looks promising that the only downside to this method and by the way this can even lead to the two-dimensional good cell models.

A

A two-dimensional array like we saw in the tank paper, because if you assume that many columns inhibit a surrounding set of many columns, you end up with a set of cells that are active in here. Spaced apart, that look like look like the grid cells that we saw in the tank paper. Hey I saw ly want to say about this. I want to get back to it then I won't be probably won't be able to really get serious about it for about two weeks, but I just thought.

A

I'd mentioned what I'm working on and it's kind of fun to think about. Well,.

C

Anything's that.

A

We can solve this problem, I mean we can do this, a complete mapping, whether.

C

Things that you're doing here is the previous model assumed like there is like one of the papers assume like there's a asynchronous clock. Frequency like you, do a lot of you know ships right now, but there's an alternative form of logic, which is self time we're basically things propagate when they're ready to go and activate the next thing.

C

So you basically inherently have these wave fronts that move outwards, and so there's that there's a the the advantage of that is, then you don't have to do you don't have to make sure your clock propagates to all parts of the chip simultaneously within a certain yeah? Yes,.

A

C

Just locally walks its way through.

A

Yeah, although here we I think it's a bit of a hybrid here, because remember these models all rely on the idea that there's a base, data, frequency and- and that is being shipping it to everybody. I didn't show it here, but these are pyramidal cells in these mini columns and parental cells always have an apical dendrite, so I mean I'm, assuming this is either layer, 3, 2, 3 or 5. These I think actually.

A

This is going on in the upper layers and the lower layers and all those cells send an apical dendrite up into the upper layer. One area and I think that the common theta may be distributed across the paper called Android.

A

So in some sense this is clock and on twice it's like these guys propagated their own rate, but they're also comparing it to a base theta, where I think wallop around all these cells will be or getting there, they might be getting the base theta on the apical, dendrites and they're being driven locally at the the theta plus frequency down in their particular layer. So.

D

C

A

Said coming, but I think it's also it relying on this sort of distributed. Clock know.

C

I I agree is, if you have two different mechanisms, one of which is inherently low latency that allows you to distribute. You know a reference signal and then the other one inherently has a higher latency, so it can propagate across and and have those those phase, differences I it make. It makes sense to me, because if you have the mechanisms in there, where you have low latency high latency propagation of signals within the cortex, then it falls out. You know, I.

A

Know it's very it I would have never guessed. It would be this complicated. If you ask me, is 10 years ago, but in hindsight it seems to make sense. That would be this complicated.

E

It's definitely an appealing image. Just like you can imagine traveling waves going over this, but at different points in time the traveling wave is different. It's like you, have these different swim lanes where it yeah.

A

E

You could detect a traveling wave if you recorded this, but there's like nuance there to where it's actually a broken into different. You know what I'm saying it's.

A

Our yeah I do it yeah. It is it's a little odd to imagine. If we're gonna have these swim lanes and they're running a slightly different frequencies, it's there has to be tissue structure to support that, and you know why. Why do I want to go one direction. Do I get a swim line of phase shift and I go in the other direction. I get an orientation ship. Well, it's there well at least the orientation ship this year.

A

We know that there's no theories at all as far as I can tell why there are slab no I've been looking. I can't find anything that says like functionally. So it's appealing that maybe that's the purpose for it and you've got these two things, but then it's things like. Why is it structured like this? How come it ends up this way?

A

There must be so I think what I've come to believe here is that almost all the calculations that are being done by the cortex are being done by interneurons and the pyramidal cells are really there for doing this sort of temporal memory.

A

Context thing we think about all the time like you have a bunch of cells in the layer that say these are all of the same thing, but in different contexts one cell becomes active, but I start thinking about you know, what's going to cause them a mini column to be all the cells, be the same. It's going to be a bipolar cell and we've already determined that with the temple memory now, why would it go why we could get phased in this direction in orientation of this region?

A

Well, there are those other weird shells than the others. The chandelier cell from these other inter neurons I've, come to believe that all this calculation is being done at the inter neuron level and and then you just assign a bunch of paramus cells to to each of the elements that have created by the inter neurons and that these allow you to create context.

A

So I need to really understand how the mechanism for this propagation we have to look very carefully at the interneurons and and and and there there is a lot of literature on them and I think I think once before. I was mentioning I can't recall that there was a I read. I read an article that one of these interneurons really has this little planar and sort of aspect to its projections and someone else look for and couldn't find it, but I think it's there there's.

A

Another interesting question is this would imply that this sort of slab behavior would exist in every cortical region and I. Looked briefly, I mean I spent like 20 minutes searching trying to find papers that talk about slab type of receptor fields in other cortical regions. I didn't see anything I didn't see anything that contradicted it, I, just the people that haven't done this kind of analysis they just haven't. Had they don't know what to look for as far as.

B

Stuff there's a lot known in the auditory cortex an a.1 about kind of frequency mappings and how that's laid out I wonder if you might find something analogous there.

A

B

Yeah and what is along the eye, so orientation slab direction, that red arrow- wouldn't you might have said this before what is the main change that that's along that these.

A

Guys always make off all these many columns are firing it the same they're, not firing that you can imagine there. You can imagine the imagine, is a bipolar cells that represent the mini column. Okay, that means a set of bipolar cells. Those bipolar cells are firing at the same data plus Delta frequency, so they're, all and- and these are spaceship that so this is the ring attractor.

A

These would be six six elements in a ring, so you this this guy's its first and within a fit within this theta plus Delta frequency, to Viggo being being being being being being being back to beginning bing-bing-bing-bing and so on, and actually these are in wrong. These are much longer than this. You can have a you, can have multiple Peaks going along, so these are they these? These mini columns are now the elements of the ring attract.

A

So this is this is I'm highlighting One, Ring attractor here and this in this progression of phase peak that's moving along it.

B

Isn't there experimental data and what the change in the receptive fields as you walk along that right out? You.

A

Know I'm not aware of it sometime and I, there's so much pain, I, just I need to spend a quality day or two doing research on the papers, but as far as I know, I have not found anything which says what these, why what happens along the slab? If you go far enough, then you switch to it. Then you go from left I preface the right. I prefer.

B

Yeah and that's different right.

A

Sure that's the next block over. So that's well documented, but what's happening across here it seems they just say: I have all the same orientation I think in.

C

The papers that have looked at traveling waves have they elucidated the two-dimensional structure of the trailing waves now.

A

You know I looked at some of those papers and they really didn't talk about this at all. There might be more than I'm sure this morning, I didn't see Kevin, but I haven't seen that yet it's it's like I saw some paper that had like title they're like traveling waves in your cortex, but when you read them, they're really not about this at all, because.

C

Well, this one of the things that occurs because we know that traveling waves exist in in embryonic development, just.

A

Wondering yeah like in the retina right.

C

Well, that's where they look for it for one thing, but what I'm thinking is is that you're saying why should it orient itself this way and the question is if the travelling waves started out early and I'm just wondering if the system could self-organize around this and they might be like you know, strict rectilinear slabs, they might be representing.

A

Not probably not like the Linea slab right.

C

So so, if you look at the propagation wave front of the wave of the of the traveling wave, it might actually reflect that local structure.

A

Might let me just fine I'm.

C

Just thinking it's it's a train. It's a training thing. The thing soap organizes to like that.

A

Like you can't miss the people here, we guys my scroll bar was coming up.

A

Give me one more second here, I think I had oh here's a big. Can you guys see that yeah? So this is uh this? Is this is like a an image looking down the surface of the cortex and the it? Can you see these light bars here? Yep those light ones are the ISO orientation slams.

A

So you know they're they're, they're kind of Wiggly and they get wider. They go all over the place and then they come to these points of singularity here. So it's not clear, you know what we're proposing is that there's a propagation of a phase along these these contour lines, which is perfectly understandable? It's not like there's one wave of something going across everything here. It's it's not like that, but I don't think.

B

There's recording techniques that can figure that out today.

A

B

Fascia yeah, because you need to be able to record from hundreds or thousands of neurons with a high temporal resolution simultaneously or.

A

Even just to it, you could do it, you might see you saw but consistent, but she wouldn't even a consistent phase relationship in a report. In this case, if you were doing use in v1 who would require the animal to move in, remember we're talking about flow bits here, so you know it's. It's like you have to activate these complex cells. Do it properly? The animal really should be moving, and then you and then you'd see, depending on fast, the animal moving.

A

That's how much of a the frequency would change, and then you have to be able to see the phase shift between them. I, don't know, I, think I'm, making myself come ahead. It would be hard. You'd have to make sure you're designing the experiment properly to record and and I'm with you. It could very easily have been missed um and also I. Think it's it's not the pyramidal cells. You have to be looking for. I think it's actually the by policy.

A

That gets to be a subtle distinction, because it's the bipolar cells are really doing this. It's a pyramidal cells themselves are coming along for the ride and, and they may not be firing all the time. It's it's, it's the it's, the bipolar cells that would be firing all the time. So I bet you there's a lot of literature out there a little each of this in one way, the other it's gonna be just difficult to take time, consuming to find it and go through it and in search for it but yeah.

A

They said earlier, it's exciting to work on and I think it's I think I think this can be solved. I think it's um I think we have enough not information, probably collectively in the world of neuroscience literature to piece together than you answer me, but to really know for certain okay I want to end I want to make sure we have time for supersizing Jeff.

D

Just a quick question: no atomically: what determines those borders like the black part is what is it? That's. What.

A

Is black borders these things here, but you asked me what this is or what these little squiggly lines up. Oh.

D

No, the black part is that the.

A

Disease I think these are ocular dominance, columns, meaning the cells in here are primarily from the left eye, and these are primarily from the right eye, and these are primarily left eye. Ok, so.

D

More like a functional division, it's not something you can see. Oh no.

A

No, that's just yeah, you can't see it it's basically, the best ideas are there taking another modality ocular dominance and they just drop it on top of this base mentality. But again you don't need that two eyes to see I and so vision and hearing is not dependent on ocular dominance at all, and so I tend to ignore that component, because that's a very specific thing to vision, you know wouldn't apply to a fingertip or something like that. You.

D

A

You know we're looking that's why I didn't even mention it and.

C

Just one other comment: it clean up and be are all applicable exactly to what you're doing here, but there is a not very popular, but it does exist, algorithm for doing edge detection, and the notion is, is that if you look at the spatial frequencies, the places where all the faces line up coherently at you know and single-point, like you know, a peak or something like that is where an edge or a valley is.

C

So if you have multiple frequencies, where they're actually differentiated the you kind of get a salience measure that okay, it's supported by this frequency, now.

A

C

A

I mean again, you know, I'm, arguing that the entire, almost all the things people thought about its orientation in v1, is wrong. It plies correctly to delay or for only but all the other layers which have complex cells, I'm arguing those aren't even those aren't feature detectors. Those are movement, detectors that floated and and remember, I get I talked about that research with the random bit images and they kind of concluded yeah right.

C

And so I didn't mean to say that I'm scribing, that to being edge detectors. What I'm saying is that there's a significance when the phases all line up yeah.

A

I, just don't know if that applies to movement, and that seems like it applies to like trying to detect edges as opposed to trying to take flow and movement. Maybe it does I, don't know, but almost all the literature which is about how how do these orientation preferences come about? Well, a good portion of it will be incorrect if I'm right about this, because the complex cells are not doing what people think right.

C

Something else the one place I would argue that it might be relevant not to the somatosensory, but to the extent that the the vision has a contributory signal to support whatever the motion is that's a way of basically getting salience out of a very confusing set of things moving around and shifting and stuff. So.

A

Yeah well, I! Guess the question is: how do you determine you know flow movement? There's the question have that happen, but remember, remember again, those dot, those random dot images, work really well for complex cells and so there's no spatial frequencies in them all there's zero spatial frequencies, it's in those in those images and they and they actually luckily complex, sell better than the the activated complex is better than engines. Okay. So that would argue that that that's, if I understand, spatial frequency, that would argue that those.

C

Were static images I'm saying if the thing is running around and everything's flowing across them, there is an implicit flow there that I flow.

A

But again that haven't suggested that spatial frequency had nothing to do with it right. If you have random bit patterns that that activate those cells better in movement, then then I think.

C

Was it under a static image? No.

A

Those are moving they're moving there of know that the side image didn't do anything to get complicate those complex cells to file they had. They had to show this random bit pattern. Moving slit.

C

A

And then, then they that that worked, that's that was the best way of activating complex cells, not edges, but moving bit patterns.

C

Okay point taken yeah: let's imitate yeah.

A

All right, let's go in super, tell you I'm going to stop sharing my screen.

E

Suba tell your muted yeah.

B

A

B

Gonna be a pretty big switch in topics. Hopefully, is that so we're gonna go from talking about iso orientation and many columns and stuff to how we might get machine intelligence.

B

So this is this is a paper I read over the weekend that I thought was very thought-provoking and I was talking with Jeff about whether we had research meeting topics or not, and I just decided to kind of put this together at the last hour before 10:00. So this might be a little bit disjointed I apologize for that. But I did think this was a very thought-provoking paper, so the paper is called AI, GA, AI learning, algorithms, an alternate paradigm for producing AGI by Jeff Klum. So some of you may have read this paper already.

B

He released it I think last year and there's a bunch of stuff coming around and I. You know along these lines that are quite interesting and he kind of critiqued the state of both kind of machine intelligence or machine learning a world as well as in part of his paper. Some of the neuroscience inspired approaches and he's proposing a particular way that he thinks is going to be the fastest way to get to machine intelligence and whether or not we agree with it or not. I'll have a take on that.

B

At the end, it's still a very kind of interesting set of ideas that obses dove. The machine learning field is now gravitating towards this. That I wanted to share.

B

So this is the and and and if you guys, if anyone else has read this paper feel free to jump in as well. This is the paper that's up on archive released last year, I found a couple of talks by him on YouTube, so I put together a short set of slides from taken from his slides. Great I only created one slide on my own, so that helped me put this together very quickly, but I didn't want to go through that and see what you guys think and of this okay.

B

So can you guys, let's see this so so? First, he talks about kind of the manual path to AI, and this is kind of the dominant machine learning paradigm, which is basically you identify these key building blocks, whether it's backpropagation and Rayleigh, you and sigmoids, and all of that, and then you try to put these together into more and more complex networks and trying to solve more and more complex tasks and it's all sort of hand design.

B

You know in machine learning it sort of bottom-up mathematical principles, and you know he's saying: is this even possible? It's this Herculean task of this huge space of possible algorithms that you, you might be trying to create, and this huge space of possible networks, debugging and optimizing. These things are our nightmare and you need these huge teams of people working on it.

B

You know such as open AI and you know Google and so on and each one you know you put more and more compute resources more and more data into it, and you try to kind of build up bottom-up from that, and so this is. This is kind of the dominant paradigm today, in some sense, what we're doing is also kind of in this path, except our key building blocks are neuroscience based things, so you know whether it's dendrites or sparsity or grid cells.

B

You know we're reading a lot of stuff from the neuroscience, but then we're trying to manually create these networks and figure out. You know how to put it all together or not into into these working systems. Okay, so his. So here's in machine learning. Here's an example of the types of building blocks that are there.

B

You know convolutional networks at their attention mechanisms. You know different loss, functions models, bayesian methods, active learning, there's you know a huge list of things that people literally a million people around the world, are trying to do various. You know take examples of this. You know try to create a network that solves some specific problems, and the question is: is this even an efficient approach? Are we able to find all these building blocks? You know: can we create systems based on that.

B

He and the basic one one kind of lesson that the community has learned over the years is that hand design pipelines are ultimately outperformed by learned solutions, as you have more data and more compute algorithms and computer vision is there and reinforcement learning are really good examples of this. So in the beginning, people used to design features by hand and hog and sift are examples of features that were the 90s people used a lot and now in deep learning, you can pretty much learn B's end to end.

B

You know you can just give it pixels and data, you don't need to hand, assign features, they're, learning algorithms are powerful enough and to do a better job than any hand. Design things same thing now with architectures there tipic. You know there used to be a hand designed, but now you can through hyper parameter, search and meta learning. Algorithms, you can figure out what architectures are are best, and you know hyper parameters are rather than manually tuning the learning rates and so on how you can, through various mechanisms, some of which we've learned as well.

B

You know you can learn what parameters are best for being and same thing. What data augmentation used to be the case that you could you could hand design specific data, augmentation techniques, but now you can run this. You know huge meta, learning, algorithms and it figures out. What's the best data augmentation and that could be better than any manual design manual data augmentation task. So there's tons and tons of examples of this that, through learning and optimization, you can figure out stuff better than manually hand-tuned kind of networks.

B

So what he's proposing is well, you want to learn as much as possible. You don't want to take all of these manual building blocks and hand tune these things, it's a very, very experimental, expensive process, but now that computing is becoming cheaper and cheaper, you should generate the algorithms themselves and take the the out itself can be automated and is.

A

Just like it's.

B

Slightly different, so he makes this point. This is not just evolutionary algorithm, so the analogy is evolution, you know, and so that's his sort of existence. Proof here is the earth. You know evolution with very very simple rules led to intelligent systems, but the machine learning algorithms today that do this are much more efficient than evolution and the problem with evolution is that it's it's not very efficient.

B

At least they Bellucci algorithms that we know are extremely inefficient, and this is such a high dimensional search space, but the meta learning algorithms today are much more efficient than evolution itself.

A

So it's sort of like, like evolutionary algorithms, but don't don't use nature's way of random variation and selection. Yeah.

B

A

To come up with a better evolutionary over exactly.

B

Yeah, we have much better search techniques now than evolution does did, and so he comes in he's sort of proposing these three pillars that we need. Well, one is the first one is to measure, learn the architectures and that's sort of fairly well known in the literature, I believe I'm, not as familiar with all of these and Lucas.

B

You might be a lot more familiar with this, these some of these things, but the second thing is where you met a learn: the learning algorithms themselves and I'll give you an example of that in a second, that's very relevant to us, and then the third one is, you actually generate the learning environment itself, that is the data sets and the benchmarks, and those things are also learned and here's a couple of good examples of what that might look like as well and the problem, and the big issue is that handcrafting, each of these things is really really slow, and it's do you know limited by our own intelligence and the time we have to do it.

B

You know why not let the computer kind of figure a lot of these things out, and so his hypothesis is that by doing this, you're gonna need fewer building blocks. You still need to come up with building blocks. It's not like you, don't have a manual process in this. You still need to have a build set of building blocks, but it's a much smaller set and all the work in combining them is is much easier, and this definitely resonates with me because we've spent a lot of time and are coming up with neuroscience based.

B

You know, building blocks, but then you know to get actually things actually working, there's a huge amount of tweaking and- and you know, figuring out what the data set and you know, figure out what the right learning rules, the exact learning rules are what the right parameters are, and all of that you spend a ton of time on this stuff. So the problem at least resonates with me whether the solution is his solution, and that is there is separate, but the basic idea is: you need fewer building blocks if you were to do.

A

It this by the way, just you know the way I've always felt that is like you think about. Oh, we did the temporal memory and and anomaly detection and all that tweaking we had to do there I always assumed that all that treatments, because we didn't have the algorithms right they were close, but they weren't quite right and we didn't really how to make them right. So we should but I always felt like we really understood the other ones better than then. Then it wouldn't be hard, but maybe I'm wrong about that. Yeah.

B

So I know I. Think that's right and that's part of his thing is that you will you look metal.

A

B

The algorithms and.

A

Said well maybe, but but to me it wasn't like. Oh, let's learn those algorithms like no, we didn't get the neuroscience complete yet so I always felt like once we thoroughly understood the neuroscience like it's not done at the level of like you know, ion channels and stuff, but really.

A

B

Yeah but I think there's a big gap. Still you know from the time you sort of started, looking at mini columns and dendrites and sparsity, and all of those different principles to the time we actually got stuff working well in practice, there was a lot of tweaking and.

E

Learning rules: that's how you.

C

Choose a winner.

E

A

B

Could be automated using.

A

Like, for example, right now, all right so now, I'm totally convinced that the whole system works on reference frames and metric spaces and would, with any kind of learner, meta learning, algorithms figure that out no.

B

No say- and he he sort of make big and those would be the building blocks that you'd put in, but exactly how you implement reference frames, so works in real real world air. You know, tasks and stuff is a really difficult task. You know how do you you know it's it's one thing to have the cortical column roughly laid out and for the gap between that and actual working systems is really really hard. Well, I guess.

A

B

Gap that some of you know again, it's not that there's nothing manual needed in that and you know I, don't I, certainly don't think it's gonna be completely automated, but automation in that process. I guess it's.

A

Like that's it, maybe there's like ten things like you know, the temporal memory context mini comes hypothesis, and now we have reference frames. You know, and until until you to me you feel like you need them. Your best discover those in biology and and until you do until you know all of them, it's going to be just really hard to do anything, but once you do know all of them, it won't be so hard. That's.

B

How I go yeah, yeah and I I? Think that's that's true too. So that implies that that's not his necessarily his viewpoint.

A

No because that implies like be a more neuroscience work to do. It's always been my take is, like you know, the quickest way to get there is if they understand those building blocks and the only way you can figure out. Those building blocks is by studying the brain right and we don't know more yet so don't.

B

Give up my brain yeah, but if I were to kind of represent his point of view, be like yeah, that's that's fine, but then, once you have the building blocks or some set of building block, combining them in a way and figuring out the details of them, so they actually work well is is quite a big task. I.

A

Guess I guess we don't know I'm.

B

Are I was that be the case for a temporal memory.

A

I know because, because we didn't have enough building blocks, I know there's so many things I know are wrong about that, but it was pretty good in it and it highlighted a few neuroscience principles that no one knew about so I. Guess: I. Guess it we're talking about hypothetical here, it's always a like hey. Do we have enough building blocks to today to turn on the automated meta-learning systems to build something, or is that a hopeless task? We need more building blocks and to me it's.

B

Always well I think the part you probably didn't see as much as sort of the detailed neuroscience until you have new pic working code where every detail of it is actually where and I still don't know whether what we have is the best actual implementation or not right. There's still a ton of manual work that went into now.

A

I understand that instant and that's the part that could be I know, but my point is I think that manual work is because we were missing all these other major pieces and therefore even our models were wrong. They were better than other models, they were getting at some truth, but they were basically had missing components and and because we missed those components, any implementation we do is gonna be difficult. That's.

B

My guess I think, yes, I, think there's two parts one is: do we have the right set of building blocks and components and we don't have those then nothing. We do is gonna work right, but the other part is even once we have that there's still a ton of engineering.

A

I, don't I, don't know if that I think that's unclear. Well, how do we know that we've never had your lights in a building block, and so we've always been trying to make things work with a partial set that aren't correct and and and the pieces don't fit together, right and so on. So I think it's undetermined. If, if we knew all the correct principles by which the brain work would it inherently be very, very difficult to put something together, I don't know I, don't think we know answer that.

A

The only thing we know right now, it's very difficult to put together a partial set of building blocks, some of which are wrong. Yeah, no yeah.

B

Okay, yeah I, don't think.

B

So you know, certainly in the AI community, that it takes a ton of work. We got those working, you know we don't know what neuroscience. If we had all of the build right building blocks in place could be then just sort of cut it up and it would work.

B

Okay. Let me continue with this.

B

So the issue, the problem with this is it takes it's gonna, be extremely computationally, expensive, but his take as well we're just getting computational. The our availability of computation is just increasing exponentially. So this is gonna, be a solved problem and, let's take a look at, why is this likely to win? Well, it's fairly obvious.

B

Generating algorithms he's arguing for AI generating algorithm, and so the thing is the amount of human ingenuity you need get smaller and smaller. You still need it, but you know the work of parameter tuning and you know trying out lots of different variations and stuff like that. If you could do this, you know you need much less manual work there and it and you could automate a lot more of that stuff.

B

Okay, so if we go do these by generating algorithms, he gives a you know one example of that that I thought was quite interesting for us and that's in the realm of catastrophic, forgetting or continuous learning. So this you know, we've talked about this in the past. This is a big problem with today's machine learning and so the basic framework here is you learn task a then. You learn task B, but when you learn task B, you forgot not. You basically tend to forget everything about 8:00, and so you know humans and animals.

B

Don't have this problem, you know, how do we create systems that can solve the catastrophic forgetting problem? Give me see like so you can. You know, learning a continuous set of tasks without forgetting on the path set, and you know he's a list of all of these different proposed solutions that are all basically he would put in the manual category in more interesting.

B

He has sort of sparse representations, that's one of them, and you know people have tried taking each of these things and say: hey, maybe journey to replay could help catastrophic, forgetting, and so you have hundreds of researchers that have created papers with various tweaks on generated, replay and trying to figure out whether ok does it or does it not solve the forgetting problem, and so they came up so he's proposing this kind of meta approach and the way would go is the following: is that you have some outer loop?

B

That's generating your parameters, theta one and and then you have a training. You have your inner loop, which is completely automated. Where you train on a set of parameters, and you you know you go from table, you start with a set of parameters, theta one and you train it.

B

So you get theta one one and you keep going and at the end of it, you evaluate how well your sequence of this inner loop worked on this meta loss function and then, based on that you differentiate or you use back prop through this entire process in the top row, come up with a bits come up with a better set of parameters. Theta two and then you go through the entire process, again evaluate the loss. You do back propagation or some optimization method. That goes all the way back and generates a new set of I.

B

Think so you know you could think of evolutionary algorithms as an example of something like this, but we have now much better mechanisms he claims of going from theta, one to theta, 2 and so on, and in the case, if you were to apply this to catastrophic forgetting what you get is something like this. So you first you start with a set of parameters. You learn task 1, then you learn task 2 and you keep going until they learn task T and then you evaluate on all of the tasks.

B

Ok, one of the tests, so a task would might be like in a continuous learning case. It would be like learning, let's say a couple of categories: let's say you take imagenet, it's got a thousand categories. You learn two categories at first, okay, these are all in the next two categories and the next two and you don't want to forget the first two.

B

So you learn kind of categories in sequence in this particular case. But then at the end, you you evaluate how well it learned all the categories and, of course, with back prop the typical background. It would horribly on all the categories. We would only learn the last two, but you evaluate on all of them and in being a one heat when he proposed you back propagate through the whole thing, generate a new set of parameters and repeat this entire process.

B

Okay, is that kind of clear it's.

A

B

Clear why that gets.

A

Better, what which, why.

C

How's that different, what we do right now.

B

Oh, it's! It's quite different in that this is all automated. Oh.

A

What's happening between a theta, 1 and theta 2 I'm, sorry, it's like there's.

B

A optimization process that's running and a meta optimization process, that's running! That's going from theta 1 to theta 2, so inside in here you might have something like backpropagation, let's say running, so this is where it says task 1 here. This is an entire process of training, one network 2 categories, jakka, and then you train that same network on the next two categories, and so on, so that we all we do. But then the new thing is this meta objective.

B

You say: ok, how well did this whole thing do you know and you can look at the area under the curb or some metric that says how well did it learn all of the tests all of the categories and based on how well it did that optimization process in this case can back propagate through this entire learning process generate a new set of hyper parameters and in this case, actual starting weights and go through this entire process. So this this entire slide is automated yeah.

A

Yeah. But in our case.

B

The process of going from theta 1 to theta 2 is manual yeah.

A

But but in this case it's it's right at the moment, it's sort of magical like you get to the end. You do some meta analysis 99, but it's not clear all what that is. Yeah.

B

So he describes I think that's. That's gonna, be on the scope of my quick presentation today we could talk about it, but you know I'm.

A

Sure he has it, it must be there but remind Sam, Harris common, a weather scientist at the blackboard and, if complicated things in the middle it says, then a miracle happens and you get ok, you get to the end of the first row and a miracle happens. You know.

B

It's not a it's, not a miracle in the sense that he has. He does have a paper on this I. Just don't have time to go through it in detail and I saw him, give a talk last last year at it, I think.

E

B

If people are interested, we could go through that paper. It's quite interesting, I think every.

C

B

Not prepared to go through the the process of how goes from theta one so.

C

I'm, getting from this is that he, basically when he's doing the loss function, he's taking to account the successive losses in the forward direction and somehow coming up with a well.

B

Okay yeah, so what this last function will say is will measure is how well did you learn? Categorize all all of these tablet categories right.

C

But it way you said, he's basically running the thing in Reverse to do that. So it's like a meta, no.

B

No, no, the loss function is computed. Just normally. So let's say this is image net. The first task. You learn categories, one and two right in task 2. You learn categories 3, & 4. Only in task 3, you learn categories, 5, & 6 only and then in task T. You would learn category 999 and 1000. No, no I understand.

C

That and then at the end, at.

B

The end, you would evaluate the validation set on all thousand categories and you get your error. Yeah.

C

You get your error, then, how do you turn that into a new set of parameters actually yeah.

B

That's the same question Jeff was asking right.

C

E

B

Back propagates through the whole thing here, that's.

C

What I was saying, that's exactly what I was I was trying to feed back to you was that, basically, you could look at you know as he moves through. You know what is being forgotten across all these tasks by moving backwards, he can come up with a better notion of you know how to deal with the loss. Egde as it's coming through right, I mean that's.

B

Yes, yeah yeah, exactly yes, yes, yeah, yeah, yeah, that's right! So, if you're able to back propagate through here, you would get some gradient. Yeah.

C

A

Says it on si the ones like the gradient with respect to.

B

Exactly yeah yeah you'd get a gradient in this big space that says how should I improve these parameters, to make this error better? So.

C

In credit assignment across the the entire thing, all.

B

Right, yes, yes, yeah and, and the details of that I don't know right now. I would need to go to this paper called and go through it in detail which I'd like to do at some point to understand it, but but basically he does have a mechanism for doing that. They do have that and I'll give the the rough idea for it later. Just.

C

To jump ahead, is there a any kind of neuro biology analog to what he's proposing.

B

No I will get to a little bit of that in a bit get when it comes to the building blocks. The the high-level analogy is that this is evolution, but the the algorithm is not an evolutionary algorithm. It's a it's. A more efficient, optimization, algorithm, okay,.

C

Fair enough, okay,.

B

So once you've done all of this stuff, then what do you do? What you called meta testing? So now you take a new set of categories, a new thing and you go through a training process. So this is meta testing, training oops, so you go through a training process and then you evaluate how well it retained all of the different categories.

B

So that's meta testing, testing, okay, so the terminology gets a little weird, but anyway, it said basically go through all of this. This huge optimization thing in the top left and at the end you evaluated on a catastrophic forgetting task with completely new categories that it hasn't seen before.

B

Then this was kind of just one. Interesting piece is that the new building block they put in was a neuromodulatory network, so here he says. Well, if you look at neuroscience there is these neuromodulatory pathways that can affect kind of plasticity and and the learning system you know whether it's dopamine or others they can affect. You know how fast you learned, so, let's just put that in so he puts in a network that basically is like an attentional type network.

B

This thing is, in his words, it's looking at the data and trying to say: oh, is this a whole new cat, Gouri or not, and if it's a new category then based on whether it's a new category or not I'm, going to impact the gradients that are coming through I'm going to impact the learning process itself?

B

Okay, so this is. This is an example, kind of a manual building block that he throws in there into this uber optimization process. Okay, so the red path is your standard convolutional, neural network? The blue path is a separate thing that is basically all it's doing, is updating how fast these things learn or don't learn in an in a you know, kind of a precise way in this path back here so.

C

Would another way of saying is that the the upper path is kind of a novelty detector and based upon that it makes more flexible the lower path, yeah.

B

And that would be yeah yeah. That could be one example of this yeah. You know exactly what this is ends up doing. It's not clear to me. I've read the paper, but that's how he explained it to it's like a something that tries to detect. Oh, this is a new task that I have not learned before.

D

That that work is the continuation of the work we discussed from Thomas last year. Yeah.

B

D

Met him, he said they were working on this. This exactly yeah.

B

Yeah, it I think it says slightly. It might be slightly different, but it is a continuation of that yeah. Well, that same basic idea. So.

B

Here's another picture of it. It you know, get some input and for each unit in this layer it tries to decide whether it's going to learn or not learn because I selected back plasticity.

B

Okay, so the results are are quite amazing and what they get is.

B

At the training accuracy and the test accuracy, if you look at the number of classes, you've seen the this approach. Outperforms all other approaches by a fairly large margin, and so here's a case where it goes through. 600 categories, I think learning two at a time. So it's going through like 300.

B

If that's the case, then it would have gone through 300 of these tasks, where it's only two categories at a time and it and it remembers a you know a very large chunk of this look at the overall accuracy, and so what you have to do is compare this end accuracy against the accuracy of a pure kind of badge train system, and he.

C

Says in left and right so I grab some sorry I.

B

Think the right side is really the main one. I think this is training the training accuracy. This is the setting the right. One is really the one you should look at and he says compared to.

B

If you look at how much of an accuracy drop, there is compared to a completely batch train system. You get about an 88 percent accuracy drop.

B

If you do standard backprop doing continuous learning, you lose 99% of the information and with their technique you only lose eight percent accuracy drop, even though it went through 300 steps of this continuous learning problem and I thought this was interesting is one of the things they discovered is that these networks learn sparsity really well, so you know, sparsity, sparse representations are a key, at least in their system are a key aspect of this and they actually figured out how to do boosting I think relatively well, because they are very few dead neurons.

B

That was one of the points he he makes there I don't.

D

Know what oml stands for.

B

C

Machine Richard, I.

B

Don't remember it's.

A

B

C

A

Machine learning, yeah yeah.

B

A

B

Use they don't compare this to ewc and all of this, so it's the.

D

Same parameters.

B

That I think ewc is not included in here. Yeah.

D

Yeah I mean not.

B

D

Good I think yeah.

B

I, don't think he WC would do anything anyway, there's as good as this okay. So going back to this, so this was an example.

B

He set up matter learning the learning algorithm, where you're basically learning how to train the system, or you know how how fast learning should go when you, when you present new categories and he's the third pillar it says, is basically there's hardly been any work on this yet is to learn new kind of data set and benchmarks and the learning environment itself can be learned and again he uses evolution as a as an analogy. Here's a couple of analogies one is evolution. Where you know.

B

Let's say you, you have trees, you know, maybe you have I, don't actually remember the but ever lose. The environment itself is changing during evolution. So first you have trees and you you may have tons of leaves I mean you have too many trees and then I think you have this example. You have caterpillars, they start eating the leaves, and then you have other stuff that eats caterpillars, and then you have maybe giraffes that are eating the leaves.

B

And then you have predators any you know, I know they giraffes, but they you know, herbivores that stuff. So the basic idea is that the environment itself is changing and as the environment changes you get more and more intelligent animals emerging. So you want to be able to having the right set of environments and data sets and benchmarks is very critical to this entire process.

B

I think another example: he used, as you know, what they call curriculum learning where you might need to train a system on simpler tasks first and then gradually build up and train it on more and more complex tasks and that's sort of similar to how humans learn. You know, as babies, we might be exposed to much much simpler environments and our parents are teaching us in a particular way and and then, as we grow up, we gradually learn more and more complex at s and that curriculum itself is very might be important.

B

You know it may be that if you just threw someone into a very complex environment, they might not be able to learn nearly as effectively as if you were to properly.

B

You know graduate the training over time, but how do you exactly do that what's best and what how do the parameters change as you as you go through that process? That is a complex process, so you might want to learn that process itself. So.

C

It sounds like, in a nutshell, he's found a way to automate generalization of a network, and if you, if you don't take the lesson from curricula and learning, it could learn the long lessons for quite a while, because it might have simplified a complex problem down to something simpler, which would handicap it from that point. So you kind of want to feed it a set of examples.

C

As you say, they go into greater and greater complexity, because from that notion, you might be able to reduce down to a fundamental set of rules that apply in in in in all situations. Yeah.

B

That's an example of what he's proposing, but it's not the only thing, but that's an example, and we've talked about some other things too. You know, maybe we need to learn simple, primitives, like curvature and and and smoothness and so on before we can learn more complex objects and more complex shapes, but it's not the only thing, but the basic idea is that the whole entire environment that the system is embedded in you know the details of how that environment works should be learned.

B

Okay and one and one example of that might be going from simple to harder test.

B

Okay, so that's his overall kind of conclusion. He thinks that working on these three pillars is going to be the fastest path to reaching AGI. The manual path is not the way to go and that you know, maybe you need the manual work to figure out what these building blocks are. But then, at the end of the day, you need these other meta learning algorithms in there too, to really do do well and then there's one other part. I thought that was in his paper that was relevant to us. That I thought I'd bring up.

B

He does discuss sort of neuroscience based approaches, so this is not what we're doing here. I just want to be fair, but he sort of critiques the neuroscience based approach in the following way. He calls it the mimic path and that it's that that involves neuroscience the studying animal brains and attempting to reverse-engineer how engine intelligent works and in his example the mimic path tries to recreate brains in excruciating detail.

B

You know as much faithful detail as possible and the Blue Brain Project is an example of this, and you know he says these are worthwhile independently, just because it he's not saying they shouldn't exist, but he does nothing. That's gonna be the fastest way to intelligent systems, get to intelligent systems and his to get to critiques of this, which I thought were work relevant and we've come across this scenario too.

B

Of course, I just want to say again we're not doing the mimicry path, but these critiques do apply to us and we faced this before, and the issues are that it's very, very slow to produce experimental data and the technology for recording from brains is really complex and the rate at which is progressing is relatively slow and the technology for running the experiments is also incredibly complicated and and we've seen that it takes years for a neuroscientist.

B

To often you know, figure out the recording techniques and run the experiment and then come up with the right results and analyze the results and all that stuff. So it is a slow way to get data I, don't.

A

Know if everyone understands the server type, this is worth pointing out the idea behind the member approach. Is you don't actually understand? What's going on, you're you're you're, just recreating the details and hope it's going to work yeah, which I think is nearly impossible? It's it's. It's so remotely possible that that would work because there's so many parameters, you don't know which ones are important and therefore you just can't get it right. So I think that proach is never going to work. It doesn't mean it's not valuable, but it's right.

A

It's so far from what we're doing it's I, just wanna, make sure everyone understands I, yeah.

B

This I wanted to yeah I, just I. Wanted that precise. It's not what we're doing I think we're trying to understand easily.

A

They're literally just wiring up a bunch of things and say what it's turning it on and say: what does it do? Yeah you.

D

A

Would no no function at all other than does it. Map physiology in the brain I mean there's no concept at all what it should be doing. Yeah.

B

I totally agree, however, I think these two issues to our issues. For us, though, the fact that you know in neuroscience is very slow to produce new data and you run new experiments and that's one one fundamental kind of thing friction the other Creek should this is. These are my words: it's not exactly his words, but the neuroscience community itself is a source of friction because you know their goals are not necessarily to create machine intelligence. In my experience, 98% of them are not computer scientists and a lot of trouble, understanding, computation and algorithms.

B

So when we try to talk about our stuff and try to talk about things at a functional perspective, it's really really hard to get. Where is when I talk to computer scientists and other people, they understand it instantly, but in the neuroscience community they have a lot of trouble.

B

Understanding some of these things and and we've faced this before and- and you know the third bullet is sort of an outcome of that- is that any data is interesting to them, regardless of whether it says anything about intelligent function, and we see this over and over again in these experiments that you know you make the tiniest weak or you know you have experiments on experiments and experiments that just uncover new data detailed data about how some piece of the system works with no theoretical framework whatsoever.

B

It's just something new that that's why it's interesting- and this is this- is really this- makes it really hard for us to go through papers and communicate with this community and and get the data that we need. So I think this is a. This is a valid source of friction that we face in Jeff Coons case.

B

His conclusion is the mimicry approach is unlikely to be the fastest path to AGI. You know, I, don't agree that neuroscience based approaches are yeah.

A

It's never gonna get ya, it's just not gonna happen.

A

A

What about it, you know yeah.

B

But but I do you know? I did want to point out that these two critiques do apply and we face this ourselves. Yeah I'm.

A

Gonna jump right in here, you know. If you go back to that chart, we showed the theta is going across and then you had some function of n. There was. There was two things in there that really critical and yeah.

E

That one right there, the.

A

First is what is your your your your what's, the L represent, on other words, that a Greek letter with it that.

B

Repens lost function, that's.

A

Your loss function, knowing what that is, it's critical if your loss function is to do continuous learning on data sets like vision. Well, that's what you're going to optimize for and and I think I think one of the problems with AI is they haven't. Had the right loss function that close to it, I wouldn't say the loss function is I would say: oh the loss functions.

A

You need to learn a sensory motor model of the world that is able to predict the inputs continuously and it's not like you know better on this type of training and so on. So you want to get the AG off that I. Guess I! Guess my I like to approach just something I, don't think it's approach to get to AGI because you want to get the AGI. You know there were very clear loss, function and I.

A

Don't think they're close to that yet and neuroscience can tell us what that loss function is, and the second thing is, you got a set of parameters on the left and that's it of parameters. It's been much more complicated than what people are thinking about today. Those parameters are like include things like attentional mechanism, with the thalamus and majah Tory mechanisms and hierarchical, there's, construction of columns and so on.

A

These are the parameters the brain works with and if you don't have that right, set of parameters- and you know the right loss function- you're not going to get the AGI, you may optimize particular problem you're working on, but you're not going to get to a jack. So this is I. Think it's a wonderful approach. If you knew exactly what you're awesome should should be, and it's a very complex plus function- and you knew exactly what your parameter should be in my argument that you can only figure those out by studying the brain.

A

If you knew those things, then yeah, but if you don't know those things you're not going to get the correct ones or the correct parameters using this approach, it's just not going to it's just not going to circus out of this thing on the round. So that's my critique of this is like it's great. If you know those two things and therefore you can solve a lot of machine learning, cast It's Made, but it's not a path to AGI. Until you can break those two very complex functions, the function and parameter sets yes,.

B

I completely agree with that and I think with neuroscience. We can come up with. You know these building blocks and and what are the you know a lot of these details here? You know our research team, a large part of what we end up doing, is going from theta 1 to theta 2, and that might take us several weeks or months to go from theta 1 to theta 2. And so you know, can we automate our research team with this approach? Yeah.

A

C

B

Team, you know that is the it not to replace the basics of what we're doing is more. You know the intriguing thing I thought about it: it can it replace the mechanics of I thought.

A

It would but don't fool yourself. That's the path, AVR, that's the path to getting better machine. You know better continuous learning or some other thing you're trying to achieve it's. It's a that's my critique. If you, if we're trying to like you know, do continuous learning on certain types of well-established problems, yeah I think that work, but they call it a path to ad I. Think that's I.

D

Think even even Jeff learning his homework like from in.

E

Choosing the big blocks.

D

He he also uses some narrow science bridge, for example, de niro, Molitor algorithm, it's Basel yeah standing to bring as well so I know. There's.

A

A pieces but I've always felt like there's I'm sticking with it I've never I've always felt you had to understand the complete framework of what a brain does and how it does it before you can do avi and- and there was no shortcut to get into that framework without studying the brain. I just seems that people weren't able to intuit what the right, what is the brain, doing, there's even get that basic thing correct and and the basic work of how it works.

A

So nothing has changed my mind that best was still necessary and we're not done yet and so some pieces like oh yeah. Ok, your market or that's one of 25 pieces year or 15 or whatever it is. You need to know, but it's it's on its own is insufficient. I mean we have neural models. Now we have no models with dendrites. We can have. You know oscillatory into it models.

A

All these things are components, but until someone says that loss function is a you know, a distributed, sensory motor or learning modeling system that works on reference frames. Then then you're just not going to get there. That's that's my point, so it's.

B

A critical this.

A

It's just to say that don't don't fool yourself! This is the all sudden we're ready to get a jar using this. Yes,.

B

I think we could replace this list of key building blocks with a neuroscience based list right, so we we have tons and tons of these dozens of building blocks. You know that we could put in. In fact, you know many of these were actually originally inspired by a neuroscience soon. Yeah.

A

But they're so short of what we know, brains are actually doing. Yeah yeah I'm. Just think about that. You know how many people are building sensory motor learning systems in a you, I don't know I, don't.

C

Know yeah might.

A

Want to look at the.

C

Robotics community, for that yeah.

A

Yeah, typically.

B

Not doing integrated sensory motor, they have a vision system and then they have a robot but they're not kind of learning together. If you will they're kind of separated, but yeah I'm sure there are people that I just don't know how many it's.

A

Just not it's not the center of robotics, it's not the center of AI, and it's definitely if anything, it's a super fringe at the moment. There's no question about now, but.

C

There there is a goal for them to for a robot to traverse any potential terrain autonomously.

A

That's a very specific goal right. The goal should be you know and we're talking about. Not just you know a robot navigate. It's like a system that learns a commune. Oh, it's working, any sensory modality, building a model of the world that you can interact or that model not just by moving your body but moving your senses.

A

You know the whole thousand brain theory. Is it's much more than just a robot moving around building in novel terrain? It's it's! Building this very complex model of the world. That's essentially motor model. The.

C

Minute event moving parts, but they would certainly see benefit if you, you know, supplied them as a package solution to do exactly what you just said. Right, I'm just.

A

Saying I, don't think they're they're moving towards a job I, don't think that the robotics world is even thinking about that.

D

So subtitle I didn't read it, but just hearing your talk, it seems to me he's proposing another paradigm shift that currently.

E

D

Someone proposes a data set and then everybody just brushes and tried to beat that dataset. It seems to me that he's proposing that the dataset is part of learning, so we should also learn better data set as part of our learning algorithm. So how would that work with what we have today? I mean? How would people compare different approaches, our Goodman's? If we move away from this benchmarking, huh there's.

B

Probably hell yeah you're right that I think that is what is proposing. There's gotta, be some goober benchmark like some some end thing, that is, that is objective and exactly what data you use to achieve. That and objective is, is irrelevant or you know it's not the issue. So maybe you know, maybe you want to have an autonomous agent that can solve.

B

You know really complex tasks in 3d. You know you where you can put it into any 3d environment and it can figure out how to navigate in and learn about the environment. Let's let's just say, and then maybe in order to get there. The system can generate simple environments as a starting point and simple actions and add in more complex actions and more complex environments automatically as a part of training it but you're right. It sort of begs the question there. You know: how do you evaluate it?

B

There has to be some bigger thing: bigger objective function right, one.

C

Of the things that it's kind of you know, driving some of the research I see is Jeff doing is looking at animal behaviors and how to explain them. You know, in terms of they got to be able to do this. They got to be able to do that. They must have that capability to do that. Those are by definition, tasks and it would. The interesting thing to me is that you know I mentioned the robotics community.

C

Well, maybe they're not looking at it in that particular way, but we could be looking at a particular way where the success is okay, you know, can it generalize, can it can it can it do all these.

A

Things you know successfully, there's a there's, a problem with thinking that approach, and that is when you take a task, but this is what they do with rat research all the time right. Oh, that can the animal solve this task.

A

The problem there isn't that the animal uses the you know, our brains are a tiered structure and then you can try to separate out what components are it's sort of related to intelligence in general intelligence which parts are related to just innate behaviors of the animal or things that are learned in a different structure in the subcortical areas and and especially when they do animal research they're, almost always focusing on things where the animal is hungry and is motivated to solve some problem for survival and and then you end up you you, you could be building a system.

A

That's you know. How does the animal solve this particular? You know sniff task in a rat or something, but you may not be getting at all to the core elements of what it means to be intelligent, you're, just thinking how's it brain and in whole solve a problem where they're, adding motivated to get something to drink and, and so I've always been careful of cautious. About that. What you really want, you know she would you be the ideal subject, because humans, you could have them do cognitive tasks, but we can't probe human brains anyway.

A

I think focusing on animal behavior is also wrong. You need to it's gonna, get you down the wrong path. You need to have a solid framework of what it means to be intelligent and then, which is what we've been doing and we've made a lot of progress on, and then you can back off from that and say: okay, if I wanted to design an animal experiment, how do I make sure that that determines is testing that concept versus just no animals? Just trying to remember you know get through water.

C

A

You can solve that problem. Yeah.

C

I didn't mean Trinity and all what I meant was that's a set of building blocks. It's not the complete set of the little.

A

Ways but I believe it lead you down the wrong path. I've seen I've been to these research labs and all these animals be tested, and you can see they test what they can test. It can't test what we're trying to get at, which is what means to be intelligent until you I, just I, think it's misleading it just takes you down the wrong path. That's.

C

My team we use for data, if that's the case, what's that, what do we use for data? If that's the case, we.

A

Have a lot of data, we have neuroscience data, the problem that the Supertop point out is the neuroscience data we have, which are literally tens and tens of thousands of neuroscience papers are not well written for our purposes. They, and so that's, why it's difficult it's difficult to sort through all the data, but I believe that we have sufficient data in in the neuroscience community. Just like I was talking about earlier. I was talking about these different eye problems and I said well. What do we know about slabs? Well, you know there should be distributed.

A

25 different papers on two different topics and you might have to sort through them all and get burn anything. It's not like. We don't have data, it's just very difficult to get at an example.

B

Yeah yeah Kevin an example you you know if, if the experiments are not designed with the right property, you know with the right question in mind: it's really really hard to take away something from their data, as if and you know, tie back to our theories and so many of these experiments just try to you know they have these, like really my nude questions, they're asking and with no consideration as to whether it's actually involved in intelligence or not or important, for understanding intelligent function or not.

B

Okay, and so that's the problem is that in it it's not so much. Where do we get our data? It's like? How do you get them to ask the right questions so that the design.

A

Where we can sort through it, I found over and over again, if you spend enough time at it and you look at enough papers and then occasionally contact the scientist and say you wrote this, but did you mean bad or what did you say about what you'd mentioned this part? What did you mean? Why did you put that in there that you can generally get to the answers you're looking for it just takes a lot of time. I think that's what unique about momentum in that is. You I mean you could argue.

A

If you don't like this approach, it's fine but I'm convinced it's the only way it's going to work and it has been working, which is you have to stick to the neuroscience it's difficult as it is combining it with into and observation and psychology. You know cycler observations and so on and and then you can start matching these pieces up together and so I know brain has to do this, I think there's no tissue. It looks like this. That's what I was doing earlier with the grid cells and now oscillators like.

A

Where does this fit into the neural data? We threat? Eclis know that the ah so train for the interference model, I think I concluded it's like 99% certain it's right, and then we said okay, if it's right, then it has to be for today in the neuroscience. Where do we find it? Almost? Nobody does that it's very, very few people you think about this at all. They they might make neural models, but they're not constrained neural models.

E

A

The premise of momentum that has worked so far: it is not a fast price process. I wish it was it's not it's not a fast process, but it has been the most effective process and I argue. We've been more honest than anybody, and that's because we've stuck to that. You know no.

C

I understand what you're saying I'm not trying to change that.

A

I just want to make sure that we don't start going down paths that I you know, but.

C

I guess my question is for subitize you're presenting this as paradigm. Do you see a a an arc in our research which can leverage what would Jeff Clun put together to somehow change, how we're doing things still being consistent with the DNA of Numenta and what Jeff, just you know, outlined yeah.

B

I think so, I think the the issue is the following that if we want to use these principles to actually once you get across the hurdle and say okay now, these have to be applied to practical problems and and create actual working kind of a GI systems. There's a big it might be. Maybe Jeff disagrees, but once you have all of these principles to the point where you have code that actually works and can solve and can demonstrate the principles in some non-trivial thing.

B

There's a lot of engineering work involved in that process, and so that's the and there we might be able to use approaches like this to kind of make that piece more efficient. It's not sort of it doesn't remove the work of figuring out what those principles are in detail. It's just sort of an engineering step of taking starting from that and getting to something. That's actually, you know solve demonstrably short, showing something pretty difficult or showing intelligence of some sort. So.

E

B

Gap where, where I think there's something like this could help so.

C

There are two ways I was I was looking at paper, one and one.

B

And- and maybe you know what Jeff said is once we know the right set of principles that could be around that could be much more efficient than then a purely you know, machine learning the principles themselves can lead to a faster way of implementation. Okay- and we saw this with- you- know some of the sparsity stuff, with temporal memory and so on, but.

C

You, but you just to be clear: you're not seeing this as a paradigm in which to generate new building blocks. It's a question of no.

B

No, no, no, all right all.

A

Right, that's the right summary I think, given what we're doing is not AGI we're building partial solutions like we did a temporal memory now we're doing it with sparsity accomplish neural networks, given that we are taking one principle at a time or a couple principle time and applying to problems that are not quite real brain problems, then this approach could be very, very helpful. I think we can all agree on that and I think that's what your main points of battaglia I think it remains to be seen when we have a full cortical model framework.

A

Whether this is the how much of this approach is necessary at that time? I don't know yet I'm leaving I'm I'm trying to be optimistic that it'd be less than, but we don't know you.

B

Okay, anything else, I know we have another meeting in a couple of minute minutes.

A

B

A

I thought I'd.

B

Stop provoking how.

A

Much traction is just take it's gotten there. People implementing this.

B

So he's now at you say: Jeff Kuhn used to be at uber, Labs, yeah, I labs, he's now at opening I and he's leading a research team starting to do its. In that sense, it's gaining traction. It's.

A

Not like it's, it's not like it's taken over on the lab. No.

B

I I think it's relatively favourably thought of part of the problem is not many labs. Gonna do this. It needs huge amounts of computation yeah to do this effectively so that that's that's a barrier, it's I, wouldn't say I would say it's probably like a couple of percent smaller than that.

A

It just strike me again, it's sort of like an uber, evolutionary algorithm. It's like, let's, let's take evolution algorithm to the next phase, yeah.

B

And yeah it's taking evolutionary algorithms and making them much more efficient, yeah.

A

C

Sorry, do you see it influencing our current approach to continuous Laurie.

B

Possibly yeah I haven't done I'm gone, not that far. You know it's not. It may not be just continuous thing. It could be anything any place where we are trying to create something. That's working in practice. You know the better. We can optimize the way. The kind of the more wrote parts of what we're doing the better.

D

And to be fair, I don't want to diminish Jeff cellent work or proposal. There is already like a growing community, especially on the meta learning community is trying to solve a similar problem, so I think Jeff's taking it beyond he's talking about, including and datasets as well, in the optimization look. But there are a lot of people who are already thinking about that for for the past few years, yeah.

B

In his yeah I think, if you look at the three pillars, he had the first one: it's it's becoming a lot more popular. The second one is it's still small, but reasonable. The meta learning learning algorithms, but the third one where you generate the data and environments, almost no one is working on that right now. There's.

A

It's funny when I think about AGI the world yeah yeah put an ad out in the world. It's lots of data in a video.

E

B

A

Got multiple sensory arrays, you don't.

B

A

B

Is this thing might be, you might the way this the system learns in that environment. You know like babies, learn and you you're put in much. The parents create a much simpler scenario in the beginning, and then you gradually kind of learn. That's.

A

B

That's the kind of thing anyway, I think we should stop there. Okay, all.

A

Right thanks all.

B

Right I guess we.

A

Have a different call to coach you for the lunch time. Yeah thanks all.

D

Right all right, so it's good.