Numenta Numenta Research Meetings, 19 Aug 2020

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Continual Learning Dilemma - Memory vs. Generalization - August 19, 2020

Description

Marcus Lewis discusses how continual learning presents a dilemma of memory vs generalization. He also presents an idea that quick few-shot learning (e.g. MAML) may offer a different, and biologically plausible way of solving this dilemma.

A

B

We are recording now uh so today, just basically this morning I spun up a quick topic uh regarding continual learning uh regarding neural networks that just learn all the time without having to go over all their old, um all the all the old training examples they saw in the past without losing that past knowledge uh and and just how it overlaps with our models that we've historically done here at numenta and just just thinking about the problem space.

B

um That's that's the thing I've wanted to present for some time, but then, on top of that, sometimes I'll, sometimes I'll, read a paper or uh or I'll see a new model that just um expands my imagination about like how the brain might work. uh How what might? What is the? What is the bag of tricks that the brain might be using? uh That are like actually biologically plausible, and um recently we've been discussing something a little bit.

B

The subutai brought up is like metal learning, and this thing called mammal, m-a-m-l I'll, discuss that just in when I get to this, but I think it brings an interesting perspective to continual learning uh and that it, um it kind of doesn't try to solve continual learning directly. It helps something else, but then it can help us like reframe how you might think of continual learning and and just in general. I think it's just a cool trick that you can you're laughing at the cat.

B

In my lab uh in general, it's just a cool trick that um that is just useful to have so I thought I thought it was worth presenting. So I think this is gonna serve two purposes: uh hello, kitty, I'll put the cat away in a second. um The two purposes are one I can kind of frame some of our algorithms like temporal memory and spatial pooler and the context of neural networks and then discuss this problem of continual learning and how it relates to uh one-shot learning or few shot. Learning.

B

I'm going to put this cat up here.

C

B

Already tried that it didn't.

C

Work for very long, marcus.

D

I know she said is that is that a she.

B

C

B

I'm not sure what to do here so I'll just run with it once I'm showing my ipad it'll be less distracting. So uh here is what I drew this morning.

E

um There we go okay, so um one.

B

Sentence I'll I'll walk through this in a in detail in a few seconds, one one thing: um one kind of old idea that I've heard stated about neural networks uh and this three days deep learning.

B

um This idea predates deep learning and that's that, like they're kind of two fundamental ways to use a neural network like if you're just sitting with an abstract neural network- and you want to hook it up to do something- um two sort of families are, if you want to use them to, you, can use them to do memory uh or you can use them to do generalization, which I kind of depict as that top photo up there, uh which is just one specific kind of generalization.

B

uh Everything I'm saying here is a cartoon. It's it's not it's not this clear cut, but I think that this is a useful framing that you can sort of think of neural networks as um as doing something memory like or doing something. This generalization like where you define generalization as that as that picture in the upper right corner, where you have classes that are at different points in the input space and you're, trying to figure out you're, trying to kind of find the boundaries of those classes and.

E

uh Let me mix this so.

B

In my mental model of um of a lot of the things we've worked on here that what we've done is much more uh we've done much. We spent much more time on memory mechanisms with neural networks and some on generalization, but more on memory.

B

So a classical example, going back to the 80s of using neural networks for memory is auto-associative memories or hopfield networks, and here I've just depicted you might memorize a symbol, a representation by just activating them and forming auto, forming connections between all of those um all those active units, and so now a part. Now, a subset of that representation can uh can cause.

B

The entire representation to activate uh jeff has many times drawn the association that a temporal memory is a little bit like an odd associative memory that associates with the previous item in a sequence, and uh so so temporal memory I mean we put it right there in the in the name, is, is a memory of sequences it's uh and we can use it for things other than sequences, but it's at its core.

B

It's it's learning associations between one representation and another and I'd I'd, go on to say, like some of our other stuff, like the sensory motor work here, I've drawn a quick little a few grid cell modules. uh Some people here will know. I why I use a rhombus to talk about grid cell modules uh and forming these associations between a temporal memory and and a location and I'd go further to say, even even when we talked about displacements.

B

No, when we talk about using displacements for composition, um the whole point of computing, the displacement of computing. These is so that we can learn it, so we can memorize it. So so, even when we're talking about learning compositional objects, it's still kind of a memory for objects.

B

So so I think a lot of what we've done. Kind of falls more into that column on the left than the column on the right, whereas deep learning, whereas mostly any network trained with backup backdrop that back propagation uh is more in the column on the right and uh but but I will say that our spatial pooler is more on the right. It's it's it's! It is our! It's kind of our source of this type of generalization.

B

Now, quick aside, what I've already said, the word generalization can mean many things. uh We do have other kinds we can. We have like learning an object at one orientation and then inferring it at another. We kind of have that working, that's a form of generalization, so here when I'm saying generalization, I'm talking about that picture on the top right. uh So now the topic of continual learning I with little stars.

B

So uh with a memory with our mechanisms and just generally the way, memories work you can set it up so that each new piece of information, each training example uh causes a local change that doesn't collide with existing knowledge. You can, if you introduce dendrites varsity you can, you can make a neural network, have the ability to store a large amount of information and it it doesn't interfere with other information, but on the on the right column. Here, uh each new information is used to choose a better basis.

B

Those arrows they start to mean something different. Each arrow starts to represent a different type of feature, uh and this can be disruptive to old knowledge, and this is sort of by design. We almost want it to be. I mean, that's, that's a that's a weird way to say it. We don't want it to be, but there, but we're getting something good, but from the fact that that it does disrupt old knowledge um and the instructive example. I I wrote on the bottom here is um often when you train a network to do recognition tasks.

B

It's a really good idea to pre-train it on imagenet, because you want it to learn just a bunch of uh reusable learn, a bunch of information that is generalizable.

B

You want it to kind of um choose a really good basis like this and and if you treated this as a pure memory problem, where you're going through and just never disrupting your existing knowledge, uh pre-training on imagenet would have less value, I wouldn't say no value, but but this this whole disrupting the basis thing uh is it. It provides a benefit. It gives you generalization.

B

So I'm going to scroll down here to kind of make a statement.

F

Yeah just a quick comment, I think, there's a useful distinction to be made there for uh generalizing to new samples within a class that I think kind of fits the picture draw with the decision boundaries on top and generalizing to new classes that you haven't seen before. So those are two uh different problems, one of them you, you have to improve your your basis to use the same name. You use there, so you improve your basis.

F

So when new samples come it's easier for you to locate within that reference frame, you can think of like that and the other one is. You have a completely new class and you have to derive new basis from scratch, and then how do you do that? How well do you derive this new basis for this new reference frame.

B

uh I would say no, um I so first of all you're using the word reference frame, but here I'd say I'd be more up to say a coordinate frame or it's it's not like a spatial reference frame. It's it's more like the coordinate system and um but I would say no like you could take this- that has been trained on imagenet, um freeze, most of the network and just uh use a introduce a whole new class and and and just train the classifier to using the same basis like uh yeah.

F

I'm not saying you can't I'm just saying those are two different problems so one when people talk about generalization in machine learning and sometimes I think the term is overloaded, so one thing is just generalizing to new samples. That's what you expect your network to do so when you show a validation, set or test set. It's going to generalize to those samples that never seen before and a completely different thing. It's generalizing to new class and that's a pre-training example you gave so those two are different problems and the the generalization term. There is.

B

A bit loaded, yeah, that's fair yeah. I was mainly talking about the one where you do introduce new classes. Okay,.

C

Can I just make a few comments about this before we go on yeah yeah, so phil? I I like this very much. I think it's a very. uh I mean these are ideas we've had, but you have a pretty nice terminology on it, and so it makes it more concise, just a couple of thoughts, I'm not really adding anything other than just some color to what you presented.

C

um I think even the word. Spatial poolers suggest what you're saying it's pooling, which is a generalization step. It's essentially saying I'm putting multiple patterns into some bucket um and therefore um you know and so on. uh Interesting and our and we've talked a lot about how you go back and forth between these two representations. You want to have very specific memories that you can learn quickly and yet you want to have some more generalization and I think, there's two ways.

C

We've made this work so far and it's not complete but there's two ways we made it work so far. One is our spatial puller learns at a much slower rate than our temporal memory right. The spatial cooler is a very slowly evolving.

C

Statistics of the many inputs, uh whereas the temporal memory works best when it's very fast, you can learn in one step. So that's one way these things differ and the other big way they differ is that in our spatial cooler um we come up with this basis set, but on its own it's not sufficient to recognize uh anything of importance. uh You know if we think about you know.

C

Why is it if we put some pattern into our spatial pool, it's never sufficient to recognize a complete object or an image or a sequence, or something like that, and so we use uh we use time to do that, like as a basis set. The spatial pooler is not sufficient to recognize virus images and so on at least the way we've implemented it. But we then we say okay.

C

Well, we can only we're sort of classifying or generalizing a subset of the overall pattern, and then we move through the pattern in space, either through time or physical space, uh sensory, motor and um and that's it's another way. We and so the that's another way. We get around the fact that our spatial cooler is very impoverished in terms of a basis set for, instead of recognizing lots of different images. um um So I I just put a little color on that.

C

I don't know if that's helpful, if you have any questions about it, but but.

B

C

Like this way of presenting this.

B

And one thing I one thing: I realized uh after thinking about this for a while, so I used to have this in my head, like this clear line like what you can either do memory or you can do generalization, uh but the reality is like these axes.

B

This x1 x2 x3, uh you maybe those are you know gabor filters, maybe they're something resembling that um in a sense you're as you learn that x1 the x2 x3 you're doing something a lot like memory, uh you're doing, you're, you're, you're learning that these things co-occur and in a class is.

C

Yeah but you're not learning specific things right, that's yeah, yeah! If that those x y and z, are not or x one x, two x, three are not um uh you're, not learning something specific, you're, basically, learning a basis set by which you will then later memorize something yeah yeah um and in our world our basis that the spaceship puller is is very poor. It's a very limited basis that I I don't know, but I imagine traditional neural networks.

C

These days have a much much larger representation of that basis and therefore you can actually classify that's what they're doing um uh anyway. I I think we're agreement about all this.

D

C

Nice way of looking at it.

B

So, in the way I frame this, it's kind of like on the left, continual learning is the most natural thing in the world. You just do it. You just learn new stuff and put it on new dendrites use. Sparsity, it's it's! It's not a the way. We frame the problem, it's no longer difficult, uh but it doesn't have so much generalization built into it.

C

Right yeah sure.

B

Right right right, except for what we get kind of for free from the spatial pooler, yeah, yeah and and the column on the right uh it sort of was designed for generalization. But then it's kind of horribly bad at continual learning, uh continual learning we didn't. Even almost we didn't even see it it's a problem before and now it suddenly is a problem. uh So how? How do you get the best of both worlds?

B

um I've I thought that there uh so this this is the the presentation I had composed in my head before yesterday uh now now that I understand um some, uh this approach to meta learning and one-shot learning, few shot, learning that I'll talk about in a second and two bites also talked about it. um It has sort of turned the solution space around on on me and made me kind of approach this a little bit differently so, um rather than rather than solving this directly.

B

uh Let's, let's put the problem aside for a few minutes and talk about solving a different problem, few shot, learning uh or so so learning a class from just a few examples: getting a you, you, you and and as a class, because we're talking about classification a lot, but it could be tasks. It could be some kind of regression, the ability to infer numbers. You can talk about lots of things, but here I'll just talk about classes, because we're used to talking about that right now.

B

um So so, given just a few examples, can you quickly learn uh to recognize in a class uh so uh yeah? Let's talk about solutions to that problem, a specific solution um and see if it gives us a new perspective on continual learning uh so so this brings me to something that subutai has shown. Sorry, this is doing weird rotation, so something subutai has shown as uh as this paper that talks about this model mammal model, agnostic, metal learning and the picture on the right I can. I can just describe this in in words.

B

uh What they do is, rather than training a network to perform a set of tasks where those tasks are say, recognizing a coffee cup uh or discriminating it from other objects, rather than learning to do that. What they do is they allow the network almost it. You could say almost at inference time, the the the system is allowed to learn much more often, it can learn once deployed in the field.

B

Essentially, when it takes on a new task, it learns to perform that task really quickly through like one or two steps through what one or two weight updates and when I say weight updates, many weights across the network may be updating, but it's it's just a very fast learn.

B

uh So so what I've shown here is that, like the network, sits at some position this this theta, where you see it's pointing at a dot- and it learns that slowly um then from task to task, as this network is doing different things, it very quickly learns to jump to theta one jump to theta two start with the asterisk, um and the interesting thing is um after that, task is complete.

B

It almost just throws out those updates, it's it doesn't keep them or anything, but but the main theta, the main position it sits at learns slowly over time.

B

I'll just say that again, because I wrote it down here, um so imagine a neural network that sits at a position theta when I say position, I'm talking about all the weights in the network. Oh whatever types of parameters, it has all the kind.

C

What do you mean by? What do you mean by sits at a position? Yeah? Is this a fire.

A

C

B

uh That was just an expression. I was I used quickly that doesn't mean much the the neural network, its weights. Its connections are kind of at some stable point that doesn't change very quickly, so.

C

This is not an activation state. This is like no uh network connectivity.

G

It's a set of weights; normally we initialize these weights randomly yeah.

D

G

Think here saying: well suppose it's not random we're sitting at the you initialize the network at some position. Theta! That's a good position! Well,.

C

Yeah, it's a good position. It's a good position, because we've had a few samples that suggest that or.

G

It's sort of defined by this. What he! What he's it's good, because you can now very quickly learn new things.

C

All right: well, that's.

G

B

Yeah. Okay, but I mean this is how they've framed the problem uh yeah, that you.

C

Know: okay: this is what they a a picture, what they would hope to be able to achieve like you could somehow know what this theta is and if you've gotten this data, then you could have this property of quickly training. Is that right.

B

Yes, so that so it's like the the network is trained over a long time over many different tasks to find a useful theta uh and- and this might sound, abstract or strange, but think of this, for example, as theta being a system that understands the system. The visual statistics of the world that that could be what theta is.

C

In a very in a very rough analogy, that would be like our space of pool the spatial polar represents, the statistics of of its world in a very simple way. This would be a much more uh highly um parameterized way here.

B

Yes, uh so so that part's analogous the spatial puller, but then the part where that takes a leap and does something new is, um is that the system then, from this point, learn something temporary. uh You could, like I mean one thing: you'll bring up, occasionally is silent.

B

Synapses you'll bring up how the cortex has capabilities to quickly learn a thing that might just be temporary and is going to go away eventually, um if you bring that into the picture where you have the system that is at this position, theta and then, given a few training examples, it can that that circuit can suddenly become a classifier for coffee cups. It can suddenly become a. I don't know a navigator to your bedroom.

B

You can come up with different examples.

B

This is this is the new trick. The new trick is being able to learn something quick on the fly and the network is. Has it specifically chosen this out of weight to chosen a connectivity where just a little bit of quick learning, even if that quick learning is uh a classifier up top quickly, just learning a new, uh a new pattern?

B

uh This is the new trick, uh so so the new way of framing it- you set it to useful theta and then make these quick updates use those updates for a little while then throw them out when you don't need them anymore. Can you.

A

State in the form of plasticity.

B

Well, that's kind of, I guess the sentence that sentence didn't do much for me.

A

It well in in the in the sense that you you've got a stable configuration relatively stable configuration there that you've achieved over a long period of time across a lot of tasks, but then there's this capability of being plastic from that point to you know to handle a lot of little subtasks.

A

But when you brought up that analogy of the silent synapses, I'm just wondering if you know there is there's a way of if it doesn't evoke anything for you. That's fine, but that's so kind of so.

B

Yeah yeah the the silent synapses just some mechanism where, where the system quickly modifies itself whether it's turning off certain dendrites segments, turning off certain synapses enabling others just something, that's quick that is intentionally just used on the fly that you could in the course of a couple seconds. Just rely on.

C

I mean our you know: we model plastic synapses with the with the permanence, and that is sufficient from a modeling point of view, but from a biology point of view, it's not fast enough because synapses can grow in maybe in an hour, but then they can't do it in a second and that's where the idea of the biology has has a silent synapses which could get around that problem. From a modeling point of view, you don't need that that's more of a biological constraint! Silence synapse, is just a it's just a synapse of zero permanence.

C

G

Yeah uh yeah, that could be. Let me uh drop another parallel marcus. I think the way you're phrasing it and kevin's question kind of reminded me. So when florian was here, he would talk a lot about short-term plasticity.

G

Like really quick, you know within seconds or milliseconds kind of plasticity um and then there's also long-term plasticity which happens over you know longer term, and so you know perhaps you know these. These updates here are kind of, like um uh you know the short-term plasticity stuff, and then you know this is a much slower kind of long-term plasticity.

G

So we make you know, lots of. Let me see if I can just you know lots of lots of quick updates uh that don't last very long, but those are kind of averaged over time and then uh but there's there's some sort of memory of those changes and then um the long-term plasticity is sort of making those kind of bigger jumps like this. um Anyway. I thought that was a interesting that I thought just just occurred.

G

H

Recall the way you're.

E

Kind of phrasing.

G

It marcus kind of really reminds me of those two.

C

Was when florin is here when he talked about that very short-term plasticity, I think he was talking about changes at the synapse. Is that right, uh like metatropic changes, yeah yeah.

D

So that's that.

C

That fits into the whole idea of silence in absence, I think it's like uh you've got the synapse and there's a chemical change can occur at the synapse very rapidly that that could make the synapse change from being nothing to something or from something to more or something like that. I'm just.

G

Trying to relate.

C

These things, if I can, if I'm trying to do.

G

Yes, a silent synopsis could be an extreme version of this where the synapse, you know, really isn't doing anything until uh it learns this pattern. um I think, for.

E

The fact that there are these you.

G

Know there are these, you know, plasticity changes that are only occur in very short time scales and are not long lasting and then there are longer lasting changes. It seems like those are those really parallel. What marcus is talking about here.

C

A

Yeah, I I think what I was trying to introduce plasticity was was trying to relate it to that there. uh You have different. You know, phenomena here and relating to different. You know biological mechanisms, even if we might not know all those mechanisms involved, but it's a uh that's that's kind of what I was kind of shooting for there.

C

A

I think we know what most.

C

Of those mechanisms are now- or these have a good sense for them and and again, as I said in biology, it's more complex than it has to be in the neural network, because biology has these physical constraints that we don't have on a neural network.

C

So, but as long as we can say, yeah biology can do these things. Then we can just we can sort of ignore exactly how biology does them. That was a big part by the way. To me, that was a huge um going back decades is something bothered me for a long time about learning and plasticity and um and how you know how we can learn really quickly, because you know historically uh again going back decades.

C

No one realized that that that you could form the synapses or they could modify quickly and it was it just didn't fit, and now we know that's possible.

B

Okay, so from here now, I'm gonna talk about continual learning again, uh so so what I've shown here is a way of framing the problem of one-shot learning or a few shot learning.

B

This does not solve continual learning in its current state. It's uh it will still, let's see if you learned to classify one object from another yesterday, and then you threw away those the specific weights for that. uh You haven't solved the continual learning problem, but I think we're just like a quick stone's throw away from solving it. uh If you had a really good mechanism for this, it suggests a really fun trick for uh for solving continual learning and using both the memory and generalization tricks. So oops.

E

My app just crashed, let me bring that back. That was weird sound.

E

B

I'll say the the rest of this was like yeah, so we already talked about this. um The network is getting better and better at fusha learning over time. So um if we have a quick few shot learning mechanism, um does it give us a new perspective on solving continual learning with generalization?

B

uh Well, I think the natural solution is well use your memory to store a small set of instructive examples of of each class and maybe two in those examples to be really useful and then, whenever you need to go and recognize x or perform that task again recall those few examples train your fuchsia learning network on those examples and then suddenly you have a model again.

B

So where would you store them.

B

I think there are a number of ways to store things.

C

And I mean is this like in a neural network, you just sort of stick it on the side, someplace and some memory or.

B

Well, no, no, I mean I'm talking about it, yes, it being in neural memory being certain synapses, uh let's see, okay. Admittedly, this is at the conceptual level that I've described like the algorithmic level that I've that I've I.

C

Mean what struck me is you like stick them aside and then bring it back when you need them and I'm thinking like okay, I'm just trying to imagine how biology does that and how do you know you need it, and um you know I don't know.

G

What that meant, you know, yeah is this.

A

G

Replay mechanisms um yeah it's along.

B

The lines of replay yeah.

F

So marx, I don't know if you're going to talk about uh learning without forgetting and prototypical networks or.

B

Yeah I mean this is this: is the literally the end of the stuff I'm talking about? uh So I was I'm I'm let's see here, I'm coming in from the angle of discussing these in terms of, could the brain be doing this? Does this sound plausible? uh So so I'm sure people have done this in the machine learning world and you're you're, giving an example of that.

F

Yeah, I was just gonna, add, then, to use for reference. So, from the continued learning perspective, there is a classic argument called learning without forgetting and that's basically the idea you store like a few instructive examples and yeah and every time you see new examples, you see if it's better than the ones you have and.

D

F

And then, at last time you just kind of interpolate and see whether it's closer to one of the example sets you already have and that.

D

F

Idea is very much exploring the meta learning scenario as well, so the prototypical networks or the whole field of metric learning kind of explored. That idea that maybe the best way is just start a few examples, and then, when you find something, and you just look at the examples you have store and then look at which one is closer, like a k, n kind of thing.

B

Well, so, okay, it's like k, n, except, but but it's not I'm not I'm not talking about doing anything like k, nearest neighbors, I'm talking about you have this. The two are kind of inseparable. You have this this theta this basis. You have this this this thing that is really good at fu, shot learning you train it with those examples and- um and then you use that so it's a little different from k, n.

G

Yeah, this is not k, n right.

F

Well, they didn't say it's k n again like, but uh I I I kind of missed now, um so why do you start? Why do you need to start example so keep retraining on them? Well, because.

B

Okay, so this this is the the whole trick. Is we're willing to just forget stuff, all the time uh we're we're not trying to keep the model around we're not trying.

B

uh This is the cool trick to, in my mind, that um if you'd asked me before yesterday, how I think all this happens, I would have said that we must store useful examples and then, as you learn throughout the day, you kind of learn a new model, and then, when you sleep at night, somehow all your old examples and with the new things you learned are somehow like merged into a model that incorporates both of them.

F

Okay, so you're talking about okay, I got it.

B

But but I have moved away from that with this, uh with this idea of doing fast, learning like fast fu, shot, learning, you're, just always recreating the models uh and when I say model here, I'm talking about the whole ability to generalize the whole circuit that takes an input and classifies it.

G

Okay, so this is not.

B

G

That's happening, you know just occasionally, when you're sleeping it's like constantly happening.

B

Right so so, because we know we can do fuchsia learning really quickly, uh and so, if we can do that, what what else can you do once you? If you have a neural circuit that can do that? uh What else can it do and and suddenly it suggests a what different way of framing continual learning where, um where you do keep around examples, but then you just retrain on them when needed, not just not just all the time you don't. You don't go to sleep and retrain on everything you did that day.

B

You throughout your day when you need to. uh I know this isn't going to be everything. This is going to be flawed, but, like you get into your car and your brain quickly, relearns how to drive.

B

Now, that's exaggerating: that's over the top, the one I just gave, but you you get the general theme of the cool idea, the trick, I'm glad I know about now.

A

Marcus when you, when you have that theta in um and you you train to this stable, semi-stable operating point, if you go up one more level of abstraction, um you could presumably have different. You could have multiple of these. These theta ends lying around, depending that you could be switching between right, sure yeah.

A

So the question I have is that if that is the case that this is, this is a kind of if you wish a configuration space, okay, uh where the where there might be a choice of you know, because I was looking at this trajectory. I was first I was thinking well, could it ever branch or something like that, then I I was thinking that if there would be something that could activate different forms of these stable points from which you can branch off of its.

A

If, if you wish it's a, you know your skills, you know that you know, and you have this theta point- that's only active in the particular context and then from there you can. You can do your little uh uh uh sub uh thetas off of there, but I'm just wondering what that mechanism would kind of look like if you have that kind of richness where, where there's these uh multiple uh uh operating points and then you're trying to find the one that matches best to the task at hand or something like that.

B

It's a good question. I don't know.

A

I mean that almost gets back to uh lucas's. You know, uh k n thing is that you know: do you? Do you kind of shift instantaneously between these different operating points, and that gives you the capability of getting specificity but in slightly different domains?.

A

I I think you know what that would look like in a in a neural network. You know from our point of view, you know, I'm not sure what it maps to as far as extant, uh uh uh you know computable uh neural networks, but it it seems like it could be. You know, given what your your insight is, that can be a very rich space.

B

Yeah, so the general idea I wanted to get across was you know this this new trick of being able to make these quick updates and and uh and does that, cause us to reframe how we think of how you solve other parts, other things like continual learning, because in my mind, that solves a lot of this trade-off here this uh the trade-off between memory and generalization uh kind of goes away it it when you frame it this way, you don't really have this trade-off between memory and generalization.

B

If you have this, almost magical ability to just learn models really quickly, learn. You know generalization circuits really quickly. It reframes this. Yes,.

F

All right, because uh I just go back a little bit to the idea of learning without forgetting and uh it might be useful to um for the discussion.

F

So the idea that I think it's very similar to what you're discussing you might correct me if I'm wrong, but so you keep a few prototypical examples and then, when you learn uh a new network, you're gonna at the same time, you're trying to learn from the new examples, you're trying to keep your output stable, refers back to examples you had in the past, so you can train in a distillation kind of way.

F

So you you run a network with the old examples and you don't want to disturb the representation you had for these old examples. So you keep them sourced. You can keep running them and you can always make sure that you keep that representation stable at the same time, you're learning uh new classes. So you have this combined.

B

See the thing is like what you're describing is what I thought until yesterday and and suddenly my big. The big realization is. That is that is that you don't really need to bother with keeping the old examples uh up to date or deal. You don't need to bother with keeping the neural network uh from losing the old examples.

B

uh You still you keep the examples around the examples themselves and then later you few shot, learn your model again on demand, but but the core, the core, let's see the core basis, you're using may have changed a bunch over the past couple weeks of doing other things. You've been, you know, driving around you went on vacation, you went hiking, etcetera and you come back. You have this totally new basis. uh You have your old example. Still they can it can.

B

You can still retrain um over the course of a couple seconds to do the things you could do before uh so so, like it's just I'm, I'm sort of changing the um the timing of it all you're and I'm changing like you're, not trying to keep a circuit that can do everything you're, keeping a circuit that can quickly learn to do anything and a set of examples that can shift it in those two directions.

F

A

One of the things that, when you showed that operating point, I think of, is that the uh and just kind of relate back to the personal experience when you, uh you learn bad habits when you first learn something and you don't have necessarily the best instructor and then you struggle at a certain level.

A

You can't get any better because because of the assumptions built into you know how you learn the thing in the first place and it takes an active effort to to school yourself to you know you get a different set of instructions, you get a different uh teacher or whatever and they'll show you this other way and then there's a struggle point where you're flipping back and forth between the two ways of how you used to do something. I think in terms of like motor skills or something like that.

A

But I think it's also analogous to um like the style of how I've you know programmed you know has changed over the years and uh at some point I realized that okay, this is holding me back and then you know I'll try to learn. You know a new paradigm for something and then I'll gradually school myself away from that, but I I think, there's these this notion of the stable operating points. uh Is you invest a lot in it, which is why you, you hate to have to learn a different way of thinking about something.

A

But you know you somehow. You have to get the motivation to say: okay by example, someone else is able to do this, a lot better than me. So what is it that they're doing and then kind of get yourself? You know schooled over to that. uh I I gotta believe that there's there's there's something analogous to what you're showing there with these these operating points.

A

That is an important part of uh trading off between the ability to uh do something, fastly and something and then uh going through the effort and learning how to do something better so that you can go on.

B

I I agree, and uh I agree, but I also think that um that, if let's see if we figured out how to do what you're saying where you have these multiple operating points, it may still be the case.

B

This picture captures everything because theta can be lots of things I mean, uh and these leaps from theta to theta one star might consist of exactly the choice you're talking about it might be consists of turning off two-thirds of the of the possibilities, uh so that this picture still might be a complete picture, but I could translate what you're saying into saying that this. This big theta here actually consists of like it's complicated. It has multiple. um It has multiple points contained into it. Yeah.

A

I I'm saying that you've, you provided a framework to actually make that have those kind of discussions, so I think it's a powerful paradigm.

B

And I'm just channeling things. Other people have said other than like this last part of the using the examples for quick um for quick training. For me, that's something that I'm sure tons of people have thought of, but I didn't read it anywhere yet fair.

A

E

G

Yeah, I think the new thing I think the new thing in what you're suggesting is you know it? You know like you're saying I'm sure people have thought about similar things, but uh you're also drawing directly the connection to meta learning and the idea that you would actually throw away this local.

G

The quick changes that you made and then there's some other sort of theta star version of the network, that's kind of the the slowly changing network that you're constantly fine-tuning. In some sense I guess yeah, but any, but the quick stuff that you learn. You learn, but it's temporary. um You know you're kind of throwing it away, and I don't know if people have really done something like that. It makes a lot of sense to me.

B

It's liberating, I I it's just a fun idea. I hope it's right.

G

Yeah, I think the thing is, uh if you think about from the mammal perspective those those uh small changes that are thrown away, you still need to retain some memory of them so that when you make a change, it still makes your network better at those tasks.

G

But you don't necessarily want those exact changes, because that change could have been bad for something if it was bad for the task you might want to move in the other direction, but you still want some memory of that around um in some sort of synaptic traces or something it really is a really nice. I think analogy between short-term plasticity and long-term plasticity here. Somehow.

A

Yeah, I'm funny. I was just writing down uh some trace lingering when you said that yeah.

G

Well, there are cases, and there are, there are traces that are that remain in synapses.

H

Can I try to offer a different uh personal example? I guess.

E

I was wondering if.

H

This sort of like fits in with what you're thinking, but I think the closest thing that I can think of that sort of uh gives an example of sort of that quick learning that you're saying um is like trying to remember a name where you actually can't remember the name, but you just kind of keep on throwing names at it until, like all of a sudden you're like oh, that's it, and then it kind of like uh it's as if I don't know is that that's unreasonable.

B

It's a fun example. I have to play with that in my head for a little bit.

H

I think the interesting part about that too is. It, makes me kind of question like what is memorization, because it's almost as if, like you can't recall the name off the top of your head, you have all the internal like everything's there in your mind, to be able to sort of play with the blocks enough to be able to recognize like okay. This is the right name that I was thinking of um as well as as, if you have the the the target somewhere in your mind, but you can't actually bring it.

H

You know into uh like.

A

It's almost like you have multiple hash paths to activate that particular specific memory. You know there's associations right so so the you're basically saying I mean there's some people who are great at straight memorization, I'm lousy at it. um I I have to do it associatively, but the uh you know there's, but I understand what you're saying is that well, you know you know you're being blocked.

A

There's you know some names standing in front of the name that you really want to access, and so then you try to go around it and try strategies and just to activate somehow that thing that pops up the name that oh okay, because once you hear it you know, then there's all this recognition right: okay, yeah yeah, that was it! Someone tells you what it is. So it's I it's I I I I agree. It's an interesting thought experiment to see.

A

You know what possible mechanisms can be involved in in making that kind of phenomena happen.

A

B

And one follow-up uh jeff at one point asked: how do you store the examples? What does that even mean? Well one one angle on this would be some of the tricks that we've that we know about like. Maybe this is a way to incorporate like memory explicit memory networks with these kind of explicit generalization networks and treating them as working together.

B

So uh so yeah using things like the temporal memory and all the ways that we talked about creating models of objects. uh It could be some hybrid of these two.

F

One question there marcus, so I understood your model correctly, then. The next question, if you ask- and I'm curious to hear your perspective- is how do you decide uh which examples you want to keep and I'm not assuming you're going to keep them at the input space, I'm assuming you're, going to keep them at some data in space or whatever it can be. But how do you decide which examples you want to keep which are the prototypical examples.

B

uh My answer is: I totally agree that that's one of the next logical questions, uh I think choosing a random sub-step will work and then you can definitely do better than random, but but that's that's the best. I've got the exact way you go about, choosing which examples you want. I mean this this. This is kind of like some classic machine learning stuff, like pro learning prototypes versus learning. Anyway, the point is keeping an instructive set of examples is a problem that has been studied a lot, and I don't. I don't past that point.

B

I can't say much.

B

So the so the prototypical examples you might store might not even I mean it starts to get into like I don't know, plato and the perfect apples and stuff like that, like you, might you might be storing something that you've never actually seen a perfect form of it.

G

B

G

Argue the opposite, too um yeah, because it's, I think prototypes might be helpful, but you might also want to store the ones that are right at the boundaries, um so the ones that are uh that really discriminate between a cat and a dog, for example, or an apple in an orange. Just it's the um because if you just train on the prototype for typical examples, you can move the decision boundaries quite a bit and make lots of mistakes and still classify the prototypes correctly.

G

So it seems like you would need to remember the unusual stuff too.

A

Could you scroll back up to where you showed the theta for a second.

A

So I'm just wondering to to take what subita just said if the theta 1 theta 2, theta 3 define a kind of subspace at that point. So if you have the extremal examples you might be able to, uh that might be the most fruitful thing to say. Okay, I need to play around within this subspace and then you know to learn the next example of whatever that is, so the extremal examples give you the the best, discriminative capability of or excuse me, the least, literally dependent subspace in which to to couch whatever.

A

That is, I mean, obviously, I'm I'm making an analogy to a continuous space. You know to something that's relatively discreet, but you know I I I I still like this kind of point from which you know there are these other things kind of spawn off and once you get to there, then there's a secondary thing of saying. Okay, so I learned the cat and the dog, and someone throws a raccoon at me.

A

Okay, I can kind of span them between theta 1, star and theta 2 star or something like.

A

That so, as as to which ones were, would be retained. It might be the ones that share the least characteristics in in one axis, but still have a commonality. Because of uh the theta point that they're, starting at.

A

I mean that would be the most efficient. If you know uh I don't know if the brain is. Is that efficient, but that would seem like the the, uh if you're, if you're, trying to uh compress that space. That would that would be your your best option.

G

Yeah, I think the one thing again going uh forward to draw the analogy to mammal is that you also have to have a notion of how good each change was and incorporate that, so it might be that you know this was a good change. This is a good change and this works, and this is a bad change right. In that case, uh what you'd want to move with the direction you might want to move is somewhere like this right away from the red and and more towards the green.

G

I don't know if that was clear or not, but anyway.

A

Well, I mean if, if the point is to try to recognize something that you know the points of of feature tangency, you know that okay, well, the closest that you know when you say you know is this: does this fall into this particular uh classification? And you try to look you know? The simplest thing is is to say, okay, how many shared features does it have or is there uh the other end of it? Is you know, is there some kind of gestalt global thing about it?

A

You know it says: okay, it's got four legs, you know or it's this size or something like that, but all those things are encodings into that space. So we make a decision of this thing. Is the closest example of something else that I've seen you know, that's, maybe what you go with.

H

That's another question: do.

G

H

That this is more of a like a supplementary um sort of mechanism for continuing. I guess in my mind I just think about like I'm wondering, like you know, I'm not every single, thankfully, every single time I come to the stand-up meeting, I'm not having to like you know. Remember like oh, you know, that's marcus, you know that'd be kind of uh you know mentally painful, but um I could see this being something of like a like more supplementary of. Like you know, I haven't played soccer in like four years, and you know now.

H

I'm gonna, like you, know, de-rust a bit. You know the sort of mental skills um through something like this. Potentially I.

B

I, like, let's see I almost wish we had a word that was like supplementary, but a little bit uh makes it seem a little bit more important than that. To me, this seems like a key mechanism of how the brain is doing, what it's doing. uh It's, not everything uh and I'm like I in some ways, when I talked about driving, I was like I was already pushing the limits of like okay.

B

This just clearly can't be everything, there's no way you get in the car and relearn to drive each day, uh but uh but to me it seems like an important part. So, yes, supplementary though I might choose a different word that I can't think of right now,.

A

Maybe it may be augmented everything, maybe augmentation, maybe.

F

B

This is one of the core components.

F

Yeah, the idea is you're going.

F

They apply to generalization to new samples because you're talking about storing a few examples of classes, you've already seen and then, when you need to uh do some inference at that same task, same class as you've seen in the past, then you can just replay those very fast and then adapt again, but that's only applied for generalization to news samples, that's the kind of what we do in supervised learning, whereas in the continual learning setting we're doing generalization to new classes, and then this doesn't apply it's like correct. Well, it does.

B

The the the previous examples, you've stored away, don't apply correct, but this theta that you've learned over time does so so yes, so so, yes and no you're correct like like this. This general approach does attack that problem, but you're right that the examples you've saved away don't play any role in that.

G

Okay, anything more.

C

Yeah, I have a few things sure, um so I first of all I apologize. I don't understand this deeply yet I mean, obviously uh you guys have a deeper understanding than I do and marcus is sort of a light bulb one of your head on this side. That hasn't happened with me.

C

I just don't understand it well enough yet, um but I do have but listening to conversation, it reminded me of the way that I think I know how brains do these things and- and this may be in addition to the methods that I know about. um So I'm not saying this is raw, I'm just saying, there's other methods. I know.

E

C

The brain uses to do continuous learning and to do generalization, and I thought I would just state them again here, just to remind ourselves of what those other mechanisms are. um The first thing about continuous learning. I think this is what mark I mean super tight was saying on monday.

C

Basically, is that if you took a traditional neural network and you sparsify its activations and instead of each unit or neuron having one set of synapses, it has multiple sets of synapses you can think of as dendrites, then you can do continuous learning um without catastrophic, forgetting because every time you learn something new, you can learn it very quickly and if you do it on an existing set, a new set of a new dendrite branch.

C

If you will, then um you all you're doing is you're you're changing if I think about a representation that represents something some of the units that are active. Only a very small subset of those units that are active would have learning occurring to them and they would not modify the one dendrite that the cell already had. We just add a new one, and so the cell now would respond to two different things, but the set of activations really wouldn't be impacted at all.

C

um So that's the idea that you could, by combining sparsity of activations and uh multiple sets of synapses or dendrites on each neuron. You theory should be able to do um continuous learning on a traditional neural network. uh I I I don't know if I phrase that right, if that's, but I think that's what you were talking about on mondays.

G

um Yeah, that's part of it. Yeah.

C

Now, in terms of generalization that doesn't give you any generalization- um and I just want to remind you that there's there is a couple forms of generalization that we've kind of deduced or known that the neocortex does one is. We haven't modeled at all and we haven't put in any of our papers, but we've talked about it. I just want to remind you what it is in um in our cortical model, the htm model, the an object is represented by uh an arrangement of other objects in some framework, and so it's like a reference frame.

C

That's populated with other reference names, because each represent represents an object. um That was the idea of anyway. That's that's! That's the general idea and when you, when you're presented with a new object, you don't know what it is. uh What you do you can observe yourself doing this?

C

Is you attend to different features of the new object and, as you attend to different features, you're, essentially, building you're you're, looking at a subset of the components of that object and that subset could be shared with a previously learned object? So if I'm looking at an object that has 10 features, I look at five of those features. I could say: oh those five features will arrange similarly to how I've seen it in a car, so this might be like a car or and another subset of five features might be similar to us.

C

I've seen in a desk- and there I say: oh maybe this is similar to a desk, um but it's an active attentional process where you're we're moving point to point attention tending to different features and and that's what we do when we don't understand something.

C

We look around to see an arrangement of some arrangement of the sub components that are similar to the arrangement of those components in another object, and that's, I think, a really powerful form of generalization is what we do when we don't uh know what something is we look around and attend the different features. We say: oh, it's going to be like a cat because it has ears like a cat and it's paws like a cat. Even though it's tail, it's not like a cat. Something like that.

C

um So I think that's the. I think. That's what's going on the brain, that's the most powerful form of generalization. That's going on in the brain. We've also talked about other forms of generalization in terms of um scale and variance uh in time tempo environments, but I think those are minor components. So, ultimately, I think we want to get to a truly really generalizable system. That says I can look at something new and figure out what this thing is and figure out how it works and guess that how I should interact with it.

C

It's going to require that attentional mechanism. Framework of you know reference frames, but I think we could do a continuous learning right now in existing neural networks, even if it doesn't give us generalization.

C

Just I don't know if that was clear, but I just uh and.

D

E

Can be in addition to the.

C

Mechanism you just talked about- I I just don't understand the mechanism.

B

Yeah that was clear in my head, the the the new trick that that influences the way I would see other things is being able to make these quick updates for the task at hand and and just having that ability.

C

But that gives you generalization that gives you generalization as well as uh continuous learning. Is that right.

B

C

Yeah, where, where the continuous learning I just mentioned, didn't give you any generalization, it's just as a it's a quick update, but you get no generalization from memorizing a new object with sparsity and dendrites.

F

So jeff, I have a question on what you just said so say you, you see something entirely new and then you know like a subset of the features, looks like a dog and the subset of the features looks like a cat, but it's not really a dog and it's not really a cat. So how do? How do you see that forming up so would you would we make a new reference frame for this new thing that we don't really know what it is, but yeah yeah? You would.

C

um Well, I think that's the question of how quickly and permanently do you memorize the new thing. So as I wrote about in and we talked and I wrote about the book, I think when you're going around the world moment to moment every day, every minute you're doing something you're attending the different objects continuously multiple times a second- and I think marcos is the first one to include me into this, um but I think when you're constantly building a model of everything you see it's even if it's a temporary model.

C

The example I use in the book is like you look at the dining room table and you just glance around the table and you built the model where the potatoes are where the green beans are, where your water glass is and so on, and you can act on that model immediately because in some sense, you're constantly learning everything all the time, but that model most of that learning will fade and so a little bit later in the day, I won't remember where my water class was. You know it doesn't.

E

G

C

um So I think you can, you can continually learn one-shot learning all the time this way, but you, um but you could forget things quickly um because many things in the world or you don't need to memorize everything all the time. um But if I, if I continue, went back to my dining room table and the potatoes were always in the same spot, every single night, uh that would reinforce that and eventually I would just learn- that's where potatoes are, um but otherwise I would sort of forget it until the next time and then yeah.

C

So there must be some sort of trace that continues on. But um but I I don't I'm trying to remember what you actually. Your original question was lucas, but I I think you can learn continuously one shot all the time and just forget things that aren't repeated in the sense of that permanence type of thing. Did I answer that question. I forgot what your question was.

F

uh Yeah yeah yeah. My question was, if you see like an entirely new thing, which is not a dog, not a cat, but I think your answer is yes. You're continuously building models.

C

You'll, learn that thing immediately you'll, just it's like learning. It's like walking into your dining room and seeing a new arrangement of dishes you'll, learn it immediately. um It's and um yes, I think you would learn it immediately. It's just it wouldn't be permanent.

C

It would fade unless you kept reinforcing it over and over again.

C

I think in terms of our research agenda here right, um I know we can do this continue something with the dendrite stuff. Maybe the stuff you talk about here today, uh marcus could fold right into it too. I don't understand it well enough. I think the generalization component I just talked about it's gonna. It's gonna require reference frames and displacements and attention and such a a bigger thing to bite off, and um we probably won't be able to do that right now. Maybe in a bit.

G

D

um Continuous learning in neural networks with multiple dendrites do you does that basically mean that um uh two individual units could have multiple uh weights or synapses between them.

C

Yes, but on, but on different dendrites, okay, so unit a, I think, you know, neuron a and neuron b can be together uh representing dog. They could also be together represent car, but the connections, the the the car representation and the dog representation would be used different dendrite segments. They wouldn't be on the same dendrites. The whole point is to separate out the space um on different dendrites.

C

But the number of units and the sparsity of activation would always be the same, but any two units, if let's say if a a neuron was active, one percent of the time- um that's we have you know one percent activation and sparsity.

C

Then um two particular neurons would be co-active every ten thousand times, um and so, if I learned thirty thousand things and on average they'd be they, those two units would be shared in three things. um Obviously, if the numbers go up to four or five percent, then it's it gets.

C

You know then it'll be 40 and you know 400, that's multiple numbers, but uh but yes you would it's just that two units would not be co-activated very often uh for different objects, but- and even then, if I you know- let's say in a brain, I might have 20 or 40 of these units active at once, even if um two or three or four are confusingly collective because they co-active in some other pattern, the entire 20 or 40 neurons, which is still very unique.

C

That's this is sort of the properties of the sparsities that we've talked about over and over again here was that clear, yeah.

D

So you said you also mentioned that that would be good for continuous learning, but not generalization right.

C

It seems to me um well, I know it'd be good for continuous learning, because it's essentially you can do it. It's more like one-shot learning, all the time. um It's not like you're modifying all the synapses continuously you're just continuously adding new dendrites. You know um when necessary, um but I don't see how that gives you generalization and I'm missing something here, it's more along the lines of it.

C

What the marker started with the temporal memory is an example of a very fast learning mechanism, memory mechanism, but it doesn't generalize at all, and so um I I mean you know I'd speak carefully. If I said you take a traditional neural network and you do it this way, it's not every single training example is going to have a new dendrite. That's all we're talking about here, um but uh it would be more like um um I don't actually don't know the rules. Maybe it's the entire park.

C

If somebody could play it, but you know most of the time we would be modifying the synapses in an existing segment like oh, this is another cat. This is another cat, there's another cat whatever, um but here's a new thing completely. Let's form a new, um it's it's not close to something we've seen before, let's form um um a new dendrite segment to learn it. That kind of thing.

D

C

Super ty did you want to say something.

G

um It was kind of switching gears a little bit, okay, okay, so uh so you, you reviewed the neil burgess paper and andre uh buchansky paper on monday, so both of them replied on twitter uh because we we tagged them. um So I thought I'd. uh It's probably. I think it's worth you looking at there is they referenced a couple of papers but I'll just share it on the screen.

G

Briefly, for you and I'll send you the link there, so uh we had mentioned our research meeting here in the on twitter and andre actually and did several replies with a bunch of details in here which might be interesting. So it's like nice discussion of our recent review. A couple of quick takes with distal visual cues head direction, needs no ego, aloe transform, that's a feature of most head direction models and the point is explicitly made in in this other paper.

C

G

C

Think I'd love to look at reading in more detail.

C

While I was reading the review paper, I did do a little research on how people thought head direction. Cells came about, and I don't know if I I don't know if I ran across their particular paper about that, um but uh the ones I did look like I I felt I found them insufficient. So I need to look at their paper and see. um I I don't know how you don't have to make an ego out a transformer I mean.

C

Obviously you have to get some sort of features that lead you to an all-centric model and um um yeah.

G

We might want to, it might be worth bringing him on at some point just.

C

We could review that paper, which would be nice.

G

Yeah yeah uh and then they had a bunch of more detailed uh comments. I'll send you the link on slack.

C

I hope they, I hope they weren't. uh They were not unhappy with my review.

G

No it's a nice discussion of our recent review looks like he had watched the whole thing and then neil burgess, let's see if I can.

G

He also kind of retweeted this, and he said nice discussion great that you put your journal clubs online. So I think he appreciated that and then he pointed out a couple of papers on integration of reference frames in neocortex. So I think we've discussed some version of these before um one of them. I think it's this one basically says: there's lots of different reference frames. It's not like there's one reference around the neocortex, but I think we're taking that to a whole different level with every cortical column having its own reference frames.

B

Well, I'd like to point out that we can see in your screen right now that you don't follow. Jeff hawkins on twitter.

C

Cool but I don't.

B

Know I know it's just funny that super tight doesn't follow jeff on the right.

G

Yeah because there's no point to be following johnny.

B

C

Don't treat I I I've become so disappointed in in this trustful and of social media. I just really just don't like it, um but.

G

I used to follow, I used to follow jeff because I think at one point he did do a tweet. I.

C

Did one tweet once my entire life.

G

Yeah, I guess one reason to follow it might be. If other people tag him, I would see it yeah.

C

I I um uh well, I appreciate this and you know I'm always nervous. You know reviewing people's papers, you might screw it up somehow, so I'm good, they didn't think yeah.

G

But they do like it and I'm sure, and I think andre had said, he's happy to continue. The discussion off twitter.

C

I would like you know I, if they listening to this by any chance um I would like to. um I would like to do this, but, as I mentioned in our stand-up meeting today, I am not going to do anything at all until I get this next round of the book done so.

D

G

C

Ignore all this right now I can't I don't have time to look at any of this, so that's going to be at least a week, so I'll hopefully remember to come back to this a week from now.

G

Yeah, no, no, I think uh I think, I'm sure speaking for them. They would be fine with that they're. Not. I think they're they're happy we're really looking into these in a lot of detail.

G

uh You're, muted.

C

Yeah I appreciate that and- uh and I appreciate them engaging with us- so um that's great- I just um I'm frustrated actually, because I can't really do a bunch of things I want to do right now, so I have to just I'm out of deadline and I'm like slave to the deadline. So um maybe maybe we could put. Maybe someone can remind me next wednesday say this again on wednesday next week, but because.

A

I want to put a jira in for jeff. I.

C

Can I'll put it in my calendar? What part is you're in? That would be great sixth year that starts wednesday a week from now.

G

So it's it's due: okay, okay! I.

C

I think michael, we could probably stop recording.

G

At this point,.

B

Yeah yeah, okay, yeah I'll, stop, recording them bye, everybody all right. Thanks.