Numenta Numenta Research Meetings, 9 Oct 2019

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Discussion of Active Dendrites

Description

https://discourse.numenta.org/t/active-dendrites-and-other-topics/6693/2

A

Something weirds going on my setup Deborah and why.

B

Don't you all sit ups.

A

Was all I'm sorry.

B

A

About an active, temporary.

B

Hello, this tournament first one first, there was a need this from my perspective, and there was a need need your predictive models and vision.

B

The predictions can't be the same. Does the operations from predictions were not the same as input, and so when we thought about the cortex and where the prediction to biggest, where the occurring and you kind of have two options representing prediction.

B

But it didn't assume that was delay. So my respect too much for something. That's a question: where are these ubiquitous predictions recurrent in the cortex? And one of the reasons you have a separate integrations arms? Is that the fatal dendrites you don't want to how to sell fire-eaters? What I have is all the colors? It's right a shoe. So if even if you could put all the synapses on one day trading, some you don't serve the purpose and.

B

A

B

B

That's a pretty special thing about neuron paper essentially said 90%.

A

Of the synapses are.

B

Not making the cell fire you make and they're representing the model of the world, but it's they're representing the prediction.

B

I think that's.

C

Why the single biggest reason yeah.

B

C

Where it came about.

B

How does it predictions of the.

B

B

D

Guess a little bit is also a just the flabbergasting amount of sign up Safari, you know just doing you know very small sub-threshold.

D

Why are there so many thousands and thousands of weeks sign up.

D

To me, like somebody, will always thinks a bit more from that anatomical side. Although I appreciate the glaring.

B

B

C

I guess, from a computational functional standpoint, it's just the set of operations done on these dendritic sentiments is different than what people word segments, so you can't just have them all in one yeah.

E

C

Actual computation is different.

A

So you're saying not all on one integration zone.

B

B

E

Can just you know units essentially this presentation.

E

To you know, just if you just extend you know, I want to know I want. I would like to understand if the the number of units that you need in a single representation is somehow linear to the number of patterns that you want to recognize. I understand, they will be biological. You know, constraints is but just mathematically it just that you want to grow.

E

Essentially the number of total you know synapses or it's there's.

C

Another one two things you need.

C

So you can have more more of these and the question. What are the key dimensions are key parameters that stay about accurately. You can do this and there's two basic things.

C

A

C

The underlying dimensionality of the representation? Okay. So if you.

D

C

D

C

Representation and you're speaking.

A

Twenty steps at a time you can only do.

C

A couple of them the korean problem, but if you had ten thousand dimensional representation of you sticking 20 at a time, you can a lot of them before you start getting into so the dimensionality of the representation independent of the number synapses is really important. Yeah and.

A

C

The second is just having really sparse representations, so you want a relatively small number, so very small percentage of that dimensionality or each other. So those are the two factors that.

E

Are you guys? So it's a name. You know it's. It's a matter of efficiency, insincere! You you could, theoretically, you know, increase the dimensions of.

B

B

C

Would be you the machine learning system? You wouldn't have the kid a million-dollar. That's a lot.

D

More fun, G alpha collectors, yeah and corresponding matrices also.

B

A

B

There's a lot of parameters here and not the primitives. No, we did a bunch of simulations.

B

And and you could sort of summarize the results in nice way and show that in some ways what nature has settled on.

B

Scenarios and it's not proof or anything that it's very nice result, and so you know, why did why did they send out? This is why we see you know integrations on move. You know one who did your dendritic spikes occur somewhere between like 15.

B

D

Meeting all the different quality.

B

D

B

There's a challenge for neurons: make connections to other neurons. It's a whole idea about potential synapse, no I'm, not because, let's each other, you can form these dances in soccer, but physically in the brain. This restrictions.

D

B

D

Way you see this, but then to all the sinuses, as always as if there was nothing around from empty space.

D

In fact, many of these staining techniques- you know they work well because they don't actually stay in every neuron, a stain every one, thousands neuron and then, when you do, that actually looks really nice, these nice beautiful hours, but of course in reality, it's a pact to ship, there's no free space right there, there's no empty holds between the neurons, it's all packed with invites and axons and keel cells- and you know fibers are passed through or so that's important to keep in mind. It's really a struggle.

B

B

D

Do this thing where they like slice, the quarter sheet with, like you know, mind, meter thin slices and then they use computer vision technology to apply to track. You know if there's an axon going through the sheet by a so there. You have all these slices and you try to track. You know how it goes and where branches enter time to go, to build these creations.

C

So one question: the first question was you know, yeah and I'm, not gonna draw arrows coming into it. It's you know. The first question is well. You know why not just have only one dead right, one, okay, so the answer one answer there that Jeff said is that you mean at least a second dendrite, because these ones will do prediction, and these ones will do you know activations.

C

And this one has an impact on time, T plus 1, on this one impact right on time, all right. So this the function of this on the cell is quite different than the function of this. So just for that, you need at least two and then you can ask okay. So this is detecting some patterns that form the prediction. You know why don't you just put all the patterns here that cause a prediction and treat that as a union, so their.

C

Capacity limitations.

C

Here- and this is just a mathematical property.

C

Yeah but I. Think of this is it's cooling. The limitations of cooling, cooling.

A

B

C

Is a subset of units that there's other things that go along with pooling like stability, but.

B

C

B

C

And then I think that's both correct. It's just I.

B

Think you answered his question.

A

C

Really a union limitation issue: you could quantify this exactly so because of that, because you can't put too many patterns, you know we end up having lots of dendrites and because each one can recognize a couple of having one or more patterns and they can be completely segregated. You run run it to the Union limitation, so you can do hundreds of patterns, contextual product. So that's the second benefit, this capacity thing and I. Think there's a third one, which is a learning benefit.

C

So if you recognize a pattern and it's either correct or incorrect, you want to update just the synapses for that pattern. If everything was all Union together, even if it was within the capacity, it would be really hard to update just that pattern, because there's no real knowledge of it anymore. What is that pattern? But if it's segregated like this, you could do brain specific learning on these things, so that you update the synapses or ways only for that pattern and I. Think that makes learning much more efficient.

B

C

I'm, just sort of answering from an abstract machine learning point of you ignoring biological considerations for a second, but this is obviously all these inspirations come from biology. This is completely and then from a continuous learning standpoint. We know that if you have very sparse representations that the chance of a single pattern activating falsely some of these other one who's very small. So now you can learn new things on new dendritic segments without any interference.

C

We have just one yeah.

E

We have more these newer model- oh yeah, sparse. So essentially your that's right.

C

E

You have the same problem: yeah.

C

Yeah, so that's true, that's true! So in our temporal memory we, you know we have this.

B

E

C

That's totally true, so you know you could have in our temporal memory. We have multiple cells per column, each one has, let's say three different active dendrites or you could have three times as many cells with one active done right each and that would be fine. It's just know it gets a question of resources and.

B

We start with that number two. You know because that's sort of the maximum you might see.

D

In the context.

D

So we are so we have multiple dendritic branches.

D

Logically, I think of special cooling in the sense that it's actually pooling inputs from nine different directions from the neuron like so laterally. It's like.

B

Growing out away from the from the so on, am I growing out in different directions.

B

D

See that's what I'm wandering out there's actually back my question so whether for the model, it would be helpful and understanding that it is actually specially divergent that it's a tree, that's branching out and I don't mean to introduce multiple levels on the dendrite. That's not what I mean, but what I mean is that and we draw them here in parallel, which kind of you would suggest that you could have like a you know, a descending fiber that would activate all of them in practice right.

D

Of course, these are branched out in in different directions, and that means that they in fact pull different fibers.

B

You want you want.

C

Your potential to be as large as possible of the cell that you connect to and because you have physical limitations, if you haven't going all over the place, you have a y-you. Could.

D

Potentially, connect the many more axons and if they were all like in one right, because if you actually have a special model, wouldn't that reduce potential interference, wouldn't it increase the likelihood that the different patterns are actually recognized on different branches.

C

Such fine spatial structure in the level of microns here and.

E

That's at the level of.

A

B

A meters talking about this spatial you're trying to say is this spatial arrangement here, actually, is that important, yeah yeah.

C

Even like this versus this.

B

Looks big compared to the cell body, but it's not very big overall I'm trying to think about. You know it can be maybe I'm tempted to put a number I'm, not sure, maybe half an elevator or something like that and that change everything is going quickly. We believe, all over the place since I'm not sure.

C

D

An interesting question.

B

D

Just like equally distributed random, and then you rely on coincidence, activation, which is fine. It works and just wondering if the problem of interference couldn't be helped right. The fact that it would be helpful to actually have some structure in it. We.

B

Haven't had much problem of interference or it really it's just not. It's never really been a big issue, only guests when you, when we train, like the sequence memory and we trained it, is until it started to fail and you can apps without there. You know it wasn't like even close to being a critical parameter. Yeah I think it's worth point out a couple of distinctions here. This is a biological distinction and may not be important from just just to reiterate. There are no excitatory synapses on the soba.

B

No right there are inhibitory synapses in somebody, so I'm like a point, neuron and they're all networked or so they're all being something somebody that's not going to happen here. There's a there are there's basically within some distance of the cell body, the proximal synapse is they are close enough that they act as if they're somehing at the cell body, and so there can be depending how many basic branches coming off the soma. You know there's some number of synapses here, a nice.

B

Typically, those represent less than 10% of the cell, the synapses on the cell, the one neuron it might be five to seven percent of those. Those are the ones your plastic, some spike.

B

Then of course, then all the others are further away and I'm actually not sure how long how distance it is to get your first branch point I, don't know if there's something here that don't qualify as proximal, but clearly after the first branch point the synapses are all too far away. These are all these sort of prediction. Synapses. The other thing I want to point out is that these branches on a real neuron this the length along these turnarounds are here, can be 200 microns and that that's a lot longer than the integrations on earth.

B

40 microns. So with the die that I get the integration zones, you have to have some number of within 40 microns of distance. You have to have some number of synapses 15 to 20 85 or some like that. That are activated some very short period of time that they integrate together to produce a dendritic spine. But you stretch those out over 200 microns. It wouldn't work. So already what that tells you is it's not as simple as we draw it here. We just keep adding segments.

B

Why not in the real neuron there you know well, you might argue there might be 10 in.

A

B

You know 20 synopsis for each, so we don't really again. None of that seems to matter at the moment, and even the thing that subitize is pointing out is if I'm going to mix and match them on here on these 200 microns they're not going to be mixed and matched over the whole distance, because that's more than 40 microns, so a natural 100 micron.

B

You know you can have to have some pattern right, guys on the floor to hear another panorama can either maybe got 40 here, I kind of thing, so you know the pattern over here. The time over here not really combined together, they're not really competing with one another, and somehow the learning signal comes back. We don't know if it affects all these synapses or, if there's still a local effect from this little area that was depolarized, it generated is possible to dis generated appendix pi.

B

Can that affect their metabolic effect or something that that this gets you trained on. We don't really know that, but I just want to point out that these dendrites, the eternal dendrites, tend to be long and the other going back to what I mentioned ago. It's related to it. The tips of these dendrites are constantly growing and retracting.

B

It's as if, as if they're just trying to find something, you know see if I make the connection, but if they do make a connection, if they find something else, is there useful? Then they stop that and I. Don't know why you know I guess, presumably they don't keep going forever. The climbing things connected also maybe they're just the signaling efficacy Coast now ready yeah, there's always the step-back action potential and the forward with the ice there. There might be a physical limit to the distance here, but all the time.

B

B

Is a good way of.

B

Learning that's.

B

C

B

B

B

B

D

Are other people who are building some morphologically details models I'm trying to put them in a bigger context.

D

We live with this big thing of the other ulu, but but they are not that they still sort of argue that this sort of that you can approximate all these silences with one number for the for the epsp and PSB. So it's still sort of like a static spring kind of thing.

D

They don't really differentiate like the model, might then actually make a difference depending on the the fact that, like these sinuses are gonna, be more effective at integrating information, because they're closer to the cell body, when you build a morphological model, but they're not really accounting for their.

D

Summing Junction.

D

Though the way that they use it, so in that sense, it's kind of fascinating to me that, even though it's like more machine learning or it is actually capturing more biological detail in some aspects- well, obviously abstracting it very far with you know, sort of like ideal life. You know like segments and bothering with all the biophysical.

D

Then you have some.

D

Yeah, that's like.

D

B

To understand that home.

D

Complex machine as well to sort of evolution like how many different proteins do you find at a synapse, and if you look at sort of the earliest forms of mines that have some kind of nervous system, you might find some 2030 different proteins and the human sign outside is at its peak. There's like, like estimates, range a little bit, but thousands of different proteins in in the small confines of you know.

D

B

D

You know there's all these people who are.

B

People who argue just like a friend or sell without thankful and right, they are really only more recent parts right, so you'll see them also in hippocampus.

B

And the goal you know, although brain structures, you just don't see that.

B

Was one of your questions.

E

Yeah smart was.

E

Green essentially,.

E

I mean in the context of Connie Mallery, where the organ of our recent condo weights and representations I'm still hungry even.

E

E

We can think of the state, as you know, as the state that is inside Iran. We subsequent you know.

B

In one of these train is that you're, essentially you're always coming all the inputs that are on and therefore to get the know and respond differentially under different conditions. You have to very fine to these ways, based on the the grave specific activations of the individual input, yellow veins, you're.

D

B

Move all these things.

B

B

It with sparsity and.

C

One difference between just standing recurrent connections and.

C

E

C

Understanding a question, so you could have you know: 10 neurons, each with 10 active dendrites. This is a pure static case yeah or you could have 100 neurons each with one.

E

You know in this direction, were you know getting.

E

C

Think this will be computationally more efficient.

C

This huge matrices, so that's one one answer one possible thing: that's important! The other thing is I, don't know, there's a benefit in in theory. You could create more efficient representations here. Then you could here because if multiple things essentially mean the same thing, you could group them here, whereas it's difficult to do that here. You know you have to learn, and we talked about this a little bit before.

E

Yes, that's less clear to me whether.

E

C

You have to learn each one separately right here, let's say the upper layer. This means this has some particular meaning the upper layer learns to wait to it. You know you could add new things here that mean the same thing. New patterns on the upper layers don't have to learn anything.

B

B

Correctly so this allows you to add to rapid learning, creating new, changing the fundamental architecture.

B

E

Idea that these announce or see like this percent or something.

C

E

Because you don't use all the neurons.

D

So essentially, what just kicked in my head and what is interesting about these dendrites is exact. Don't make it necessary for the learning to go down all the way it doesn't need to suffice. The whole network criteria. It doesn't need to be retrained on all days if you wanted to marry that to deep learning network architectures recognize their first apostles positive.

D

But if you wanted to talk about accurate, active dendrites, I think well, that would translate to is a is a switched back propagation, so you will still like if you have a miss system right where you would use multiple dendrite nons, but you want you to learn them as a convolutional neural network with you know traditional gradient descent, then what that would mean in practice is that the error does not propagate all the way back, but in fact it propagates only part of the way back and there is an actual stopping condition depending on the dendritic activation.

D

So you still can back propagate the error, but only if the dendritic segment actually took pilot. If it didn't apart, you stopped propagating the Arabic word that makes training cheaper and it allows exactly for the this wheel property that not every neuron takes part in in you know: sufficing the constraints of all the components of you, train, I, think to some extent, I would actually just magically happen. If you set up this network when.

A

You set it like all the logic with maxing and knitting and stuff, and those gates already have all that magic built in to where. If something gets, maxed out, something is clamped. My name stops right there. So I think that would be. What would what someone would accidentally could, as you just provided a justification part.

E

Right yeah, the problem is that you know they're, not sparse, so you don't tell me he's probably in for so much it's just if you consider normal evasions, it's 60% of these needles of acting. So it's a large population. It's exactly of the same time he's you know. The part of this globalization problem is all the ways are important so itself he still to.

D

Understand I am talking about the.

A

D

Step right after this fest I.

E

Mean it's all about enemies, entangle parameters and representations as well. So, of course, if you have a very sparse, you know sees them in terms of parameters and in terms of activations, you can work much better, so I. This was also another requester. So if for you for example, what do you think that sparsely is is enough for coordinating it? We don't take advantage of the multiple active dendrites and the thing that is just about you know having more neurons that are sparse and produce.

B

Now, if you can keep that, if you.

D

E

C

So yeah, so we're really thinking without any temporal sequences just static yeah. You know what is the difference between these two.

B

C

Something you know.

C

That's what I was saying here that levels up you can reuse the.

E

B

C

Three inputs coming in or three separate segments, each of which has inputs coming in.

B

And and then over here you've got this thing has converting.

E

B

E

So so, essentially you can. You can translate DC Thunder point point yeah.

B

E

You don't they don't have all this stuff about. So.

B

Here, that's right! So here you get this argument a lot, especially from people doing heart yeah. They don't wanna model neurons with dead lights, so they don't do.

E

A

You're out you're not model.

B

Here we can just model that point neurons. Here's the point, another point, another point here in Africa I mean great. They say we don't need to do this, but that is not true. It might be true for some simples yeah, very simple thing.

B

And there's a communication has to occur between means that you don't have a normal point, neurons right, because we point out times when this guy fires, it's kind of it's gonna, basically cause a learning event in one of these guys.

B

Doesn't often so I'm like this.

B

Conference- and this is what I meant caught my inspire and he spoke and I spoke word back to back and he came up to me after my talk.

B

Everyone to be telling me I, don't need because they're just like this and then they said to me after Tom I tell you now I realize I can't do it that way. I have to do it this way, because you can't really, you can't take those.

C

E

E

C

These guys are.

B

These were go to here and then these four go.

D

And are thinking where there would be more helpful for talking to the machine learning community to think of this act. Look we're gonna introduce like the second class of unit here, which we call a dendritic unit, and then mainly they would understand him, because then you know how to talk about the biology of it. You can just talk about the fact that these units are bit special on the sense that they have these converging inputs onto these other classes of notes. I.

E

Think it might be helpful, yeah I mean the point. Is that, as we do information learning people, it has to be justified in something.

A

E

If we cannot say, okay I develop this new neural model, because it helps some property, ism or deaf squidy performance. Then work.

C

E

When you can, you could.

C

Even you could have a you could do that and you can have as comparison this flattened version. The flange version is gonna, have many more parameters yeah than this version, so yeah there's an issue there, but yeah. You could certainly do that comparison. What.

D

I'm, seeing on logic.

C

Yeah there's with continuous learning you have to I think think about when you get this next class, how do we leave? The learning is going to happen.

E

C

And it's a little tricky.

D

It's mono clergy treats has like the zoo of classical neural networks in there, and so they show by different colors like green hidden. You know, yellow our input units and red our output units, and it's like this big memory, cell, recurrent cells and memory cells, different memory cells, matched input output cells, like you know, so, there's like different different classes of notes when RNA is sort of like talk about these different architectures and sort of these terms, and they just put them together and like these nice color diagrams and one.

A

Way to think about.

D

This in machine learning from context this exact yourself well, there's one more categorical, know: DM right, just not a you know, which we call it like a dendritic cell and there's. Some special condition has to be layered with, like these other, you know main cells, and then you could build an architecture rather than machine learning. Person would understand without having to talk about.

D

B

A

D

Just like to so they you know disregarding the evidence they go like this there's so many secrets we have outside of that end.

D

A

So this is going to be jolting, I'm gonna talk movers, fort Opik, but then there's an interesting, continual learning question. But my original purpose was this other thing: just a clarification.

A

Like we have this kind of fairly complicated the four layer network that we're like I mean it's like a big form of four layer: CN n there's some T cells, I'm, leaving out here and we're what they were optimizing for. Is they want to be able to to classify objects from this top layer and in even be able to classify novel objects without ever having to change?

A

That's not right, and the interesting observation was that by training a network on that purpose, they got something that where the top two layers like were that they're representations could be linear map onto representations between these things. I'll just do like dotted lines. You could do linear regression from here to here to predict units firing and now the one one thing that I wasn't able to respond well to intelligently. To was that was that that this is self-evident, that this would happen.

A

The idea that, if you can classify you something linearly from here and from here that almost implies that there must be a linear about thing between the two yeah I think that was the crux of the issue. Yes, that true or not so I think I can so I think over here I can describe that statement in a way that makes it obviously false.

A

So, first of all, like I, think part of the confusion came from. We were talking about both linear, classifiers and linear regression and I. Think the conversation got a little confused from that. So keep in mind that a linear classifier, so network output. This is this- is either this or this they call it X a linear, classifier comes from applying weights, you get a set of scores, but then you like remove most the information from this. You do a maximum for it.

A

So if you're actually like unit this unit has activity for you, the student has Activity one etc, but then you max it out and don't pick the winning unit was just yeah.

D

The class and you might use soft max or some kind of SVM thing, but the point.

A

Is you get something along these lines and so the to phrase it this way like consider two networks with outputs, x1 and x2? Two of these, if they're classifications are always the same, does that imply that you can do this linear mapping.

C

And by show so I've shown, yeah I think what the max in their events, I.

A

Think that's what an infusion yeah and- and so it's really quite easy to just draw these example like.

B

That's what the classification is.

B

A

Expectation is that different networks are going to output the same scores. Oh.

D

No, so the weight is now this hyperplane right, which is used to separate the two yeah classes. So.

A

Here, I'm depicted two classes class, one and class two class. One is stars, plus two is triangles and I've labeled them ABC. Indeed,.

B

A

So they didn't train the network so that its core function was something in particular they trained to the network so that this was always correct.

A

Yeah so here I'm just trying to show picture two different network outputs, one of them. This is a schematic picture. You could think of it as two output units or you could think about as a schematic picture for many units and here's the wave matrix combined with weights and the bias turn causes this kind of decision boundary and from and basically.

B

Yeah I mean what's really happening, is.

A

Here here is like I call it a decision boundary, but really it's it's a scoring function. It assigns scores like you could. You might actually have a triangle over here, but anyway the point is it's a scoring function and now I've shown particular examples: three distances of class to go three instances of class one and I've shown how two different networks might put them in different places and I've intentionally drawn them in different orders or where a b c b, a c e DF d EF to show just like it's pretty obvious.

A

You couldn't do a linear mapping from this to this. You can multiply some matrix by the student than be here, because you can't read it. You can't change the order of things and anyway, I think that this was worse. There was some confusion here today, so it's at least from the perspective of linear, algebra, linear mappings of matrix multiplication, whatever it's not the case that this is self-evident.

B

D

Of known any activity.

B

That is their promise.

C

You can certainly do you know you can simply rotate this whole space so that the classifications are still the same. It's just you're saying the order of these guys. If you're trying to predict this guy's response versus this, that's going to be you.

A

Can't write but you're not going to be able to say that, like okay, this unit.

A

From that information, I can predict its B's activity. That's not possible! That's possible.

D

Might work, but it there's no guarantee you whatsoever right.

A

Now I think maybe some of your intuition came from something that might be true and in the paper kind of points, this out that this might be evidence that this statistics of the world are such that there really aren't very many bases where this is possible. There are that, maybe maybe the statistics of the world are such that.

A

That's if you find this like that, if you get this, you have converged toward like one perfect basis, there's only really one or there's only one, whose small family of base use map bases that are kept mostly linearly mountable to each other or something.

C

Else's could be true suppose these are really high, dimensional representations here at here, and you have a relatively small number of patterns.

C

It could be that, even though, technically this is this is true in high dimensional spaces, you would be able to do that because the because these it's not going to end up like this, each pattern is just going to be in its own cluster somewhere.

C

All you have to do is sort of map those clusters, I, don't.

A

Know if that's true or not again and I, think my my Gus here is if an author on the paper were standing here right now, they would have the perfect answers to that question. They would say we intentionally only sampled, a small number of units making sure we solved for that case, but I'm not going to be able to give a good answer.

A

And so yeah, so ok, the closing point, and that part of this is is it is there are two possibilities, but two possibilities. One is that there really aren't very many of these bases where this is possible and it's pretty cool that the deep networks are able to find it. Our brain finds it too. It seems you could think.

A

Maybe one way to map this I'm not talking, maybe, for example, the equivalent of the spatial cooler and and for a several cars, a 90 bat-spaceship, that spatial cooler has a basis of sorts, and maybe it's actually quite similar to this basis and you'd be the right way to think of it. That's just that's me trying right now to map it onto our world.

A

If you're gonna say something it's hard to.

C

Say there's similar basis.

C

What we're saying is that we can linearly.

A

C

A

C

Mapping from one to the other, that's predictable, yes, so I think the nationalities to be still I.

D

Guess other way.

D

Whether you know, but.

C

They didn't do that here. In this could happen. This could have a million dimensions, but at 100 miles all right. We don't they didn't control over that. Maybe they did I, don't know actually.

A

So I guess when I say the same basis, I'm really thinking about her being of Basie's living in these kind of families or like this family of bases, aren't lynnie are kind of related to each other. You can kind of linearly not between them. This other family over here and I'm, saying but they're all and they're in yeah and.

C

I think in high dimensional spaces, when there's things are sparse. That almost has to be the case if you're going to be will easily classify them. If you think about our STRs and if you have clusters or things that are going to be naturally far apart from one another that case or whatever.

A

The other possibility I was going to bring up as a possibility one. There aren't very many solutions to this where you can linearly separate them. Possibility to. There are a lot of them, but it just so happens that maybe convolutional neural networks learning rules are sufficiently similar to the brains, learning girls that they find similar ones.

A

So why do they seem to find the same ones? Isn't interesting question and you notice it might be a bug in the analysis, something like yeah meta-analysis might be fine.

C

It might still I'm still just wondering whether any any high dimensional system that solves this problem would be able to back to just a few hundred dollars. There they're testing.

C

For the ideas this.

A

Is a good question because they do have a benchmark for comparing against neural data. You know I didn't test I, don't.

C

I don't know in detail of what data exactly they released if it was the Synod jaggus crux of the question is: how constrained is this relative in the space of all possible models, and this is really really constrained and it happens to find this yeah? That's really cool.

C

A

Was kind of strange like they find a linear regression.

C

But that's only given the images at the scene unrestrained up and when they use that stimuli regression for images that wasn't trained on.

A

To predict IT responses that correct, so the linear regression problem I'm, seeing no when they were when they were predicting neural responses, yeah that was on.

A

There's multiple levels of training here, so the conversation can get a little confused for one when they were doing the back propagation over here they were using a totally different set of images. It's so totally different set of image classes yeah later when they were doing this testing, they were using a new set of image, classes and and training. Yes, a linear, classifier. Sorry I need to need to pause for a second. Yes, they were, they were doing linear regression from units here to units here. Was there a trace?

A

Yes, they didn't use the same images. They.

C

Trained them some images and tested others. My curiosity, hey just seems like an agency. There was like that's more of letting this transfer learning question. It's just. It's cool that CNN.

A

But I think many are saying the fact that they were able to predict neural responses to novel images.

C

A

That was immense, vital at the head, impressive and then my question is: does that sort.

C

Of even sort of code, this condition on like if they are restricted the complexity of the scene, then maybe like you know, let's say you said this.

D

Huge linear regression, that's like very just complex and you know maybe.

B

C

Just use it like the dimensionality is so.

A

I mentioned that I'm going to bring us back to continue and learning.

A

Maybe I'll do that now briefly and see: where are those so I just wanted to bring this image back into your mind of like, let's think of a deep network just for now, as what it's doing is it's trying to make its top layer linearly classifying and if you think, in those terms, if you think in those terms now that can take the question of continual learning, becomes the becomes kind of a two-part question of, like suppose suppose, you're you see batch one of them a judge, then you see batch two of images, etc.

A

um You can kind of decompose this into two parts. It's like you want the basis to continue keeping keeping everything linearly separable for the old images, and you probably have to update your classifiers for the for the old classes and I. Don't know that just provides a point of view that it really matches I know like and what the trends have presented has like the feature layers and then at other ones. So it's like it goes in line with that I guess. I'm.

A

This is kind of a mental model of of what deep networks are doing when we talk about our compartmentalized dendrites, that's like adding memory to the system, but this is more of like a generalization system. This is more like a memory system and- and it's not just gonna- imagine I it's worth experimenting with, but it's it's our words not cut out for us. It's not just gonna magically, there's some cleverness as needed to sell some.

C

A

If you think about this version,.

C

Okay should I sure it's gonna, it's not so much a memory system anymore.

D

With stopping I mean the idea being that it doesn't go all the way you say is.

C

Right and what you mostly are gonna change is just the readout layer.

C

Well, you use you change some sparse.

C

Most of it will be so.

E

I guess, if sparsely may be enough to yeah, we personally know many justify.

A

E

You know somehow you can maybe get rid of the words that and then eating them. You can perceive a lot of the connections and their relationship receivers is in under representation. That's right and- and you can somehow have these nice property- or you know long time to back propagate everything. Repair stops very different points. You can preserve a lot these weights yeah, it's not even that's cool but- and we don't apply here.

C

E

Yeah without doing anything, much more than sparse, okay, I was trying to think about. There is a memory, a comic efficiency issue yeah and then, but they still, we need a university to preserve the weights. At the same time, a bit more like two consolidations raising or something like that. As you know, sometimes maybe a urine can fart detecting a feature that is useful for recognizing an old, don't cost as well as a new class, and then what that means is that for something up to layers.

E

More simple: you, if you lean the backpropagation, if you look just the new class, you know.

E

Enunciate the connections ways words that Niram.

C

Look at you using.

E

So that, for example, is an example in which still having this property of you know the story. Propagation gradients is not going to be enough. If that's.

C

E

Inspires as well yeah, so that's the fire as.

C

Well, I mean this.

E

C

Fire and these.

E

A

You would still.

C

Have that problem to some extent but it'll be less yeah, of course be less. But maybe you can now use this elastic wave consolidation and it might be much easier because it has to solve a much simpler version of that. You.

E

Know later say that you know please neurons can activate, and so, for example, this is a beautiful to predict a senior class as well into this previous class that this is gonna be, and this one it's going to be decreased. If we don't see if this is if these cuts right across trees, maybe these feature is very important to that particular year.

C

B

This is going to be accurate. It was not.

E

Useful for the.

B

E

Data that we have, and so in the end this is gonna, get penalized or is much more increased and just gonna mess up somehow, even though we have sparks here right yeah in this case.

C

Yes, the sparsity will reduce the chance of that. They all still happen and maybe with the elastic consolidation or something like that, and it's not, but the other thing that we do is you know our our negative error in the HTM is often in this case of where you have a misprediction, we think about in temporal memory.

C

The negative error is much much smaller than the positive a amounts up yet so the decrease, so you see I mean partly for this reason you have, as you can make in the case of temporal memory, you can make a lot of predictions. Only one of them will actually happen doesn't mean the others were wrong all right. So what may be the translation here? I've, no idea this will work is in the case where it correctly predicts something the increase in the word yeah.

E

A

C

For different, depending on whether it was correctly yeah right much more seldom correct yeah. So that's why it's fine to do little deep, potentiation yeah, because you can get a lot of little potentiation! That's right, yeah, and it depends in this case.

A

How many classes you have, if you hungry classes,.

A

But that's something that came to mind: I want to try one more time to articulate it up so far, I guess what feels strange to me with this okay here, I'm I'll, try to say back what word what we're trying to do with these. With these separate dendrites, we are making it so embed. Subsequent training data doesn't stomp all over the weights, doesn't stop all over old weights.

A

We want them to be kind of guarded from from later training data, but there's this conflicting interest that we want the network to always be searching for this really good basis, and we wanted to continue using a future data to change its basis and those two are at odds with each other. Those two things see my top: preserving weights versus having your bases get better and better. As you know, as you observed on internet, they seem there there's a contradiction. There, there's something yeah, there's a dissonance.

C

Maybe one day I can say the the temporal memory sighted on one extreme, we're literally memorizing everything, yeah and pure against backdrop. Is that another extreme, where you don't care at all about the previous table every morning? Stuff is trashing everything and maybe there's a sweet spot there. Where you know you still have some of that problem of trashing all stuff that you really introduced it, but.

A

You allow the system.

C

To create a more generalizable basis, it doesn't like mean you have to be always at these two end points.

E

The question is, is set just the trade-off or is there is smart policy right that can.

D

Get most off I.

D

Mean I wouldn't know what that looks like well. The answer is.

D

A

Thanks guys, thank.

C

You thanks every stop.