Numenta Live Streams, 29 Apr 2019

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: AI / Neuroscience Chat - Catastrophic Forgetting

Description

Broadcasted live on Twitch -- Watch live at https://www.twitch.tv/rhyolight_

A

Where did you today say pink floyd on twitch.

A

B

It definitely got enough.

A

Errors that you to back up like that.

A

Alright I just lost six viewers.

A

So it's close enough. It's close enough. Let's talk about catastrophic, forgetting except my whiteboard, you we I guess we can do the whiteboard I had.

B

A

Things on here, but I think I.

A

So the topic today is is sort of in the AI machine learning realm. So so this is in the context of machine learning, not neuroscience I would say so, and the topic is catastrophic, forgetting or as wikipedia tells me. It can also be called catastrophic interference which I haven't heard before, but it's the tendency of an artificial neural network and when they say that's always defined as saying that it applies to artificial neural networks.

A

I would dispute that it's not all artificial, neural networks that are prone to catastrophic, forgetting they're talking about deep learning networks, anything based upon the. What would we call it? They point neuron, that's doing primarily spatial pattern. Recognition and classification is susceptible to this idea of catastrophic, forgetting because, let's look, let's look at.

A

I always like to look at this. We're gonna go there. It is the neural network Zoo. So this is a great graphic and these are sort of like all the different architectures, that you can build lots of different are code. You.

B

Can build even more than this, but this is like these.

A

Are like design patterns or programming patterns of ways that you can put together artificial neural networks, and this is all very focused on deep learning. I call this deep learning, yeah.

B

A

Diagram isn't it so you can really sort of understand how a lot of these different ideas are structured of convolutional, neural networks. Here's a deep, convolutional neural network, oops wrong one. No, this guy and you can. You have to imagine having lots and lots of layers like you. Can you can sort of take any node in here between the input and the output and expand on it?

A

You know, and just and let's like do that, process, let's whatever that is doing over and over and over again, and that's what a lot of what the convolution convolutional layers are doing, is sort of tweaking the data and different in different places on the way through.

B

And you can do that a little or you can do it a lot.

A

But well you end up with on the right side here mark says he sees the round ones as the hypothalamic clusters. They're.

B

All round, oh those row, the hopfield.

A

Networks right, the marker markov chains, interesting I'll, have to think about that I know. I heard you mentioned something about a hopfield network and for our bolts ministry and I can't remember. That would be a good topic for one of these AI chats in the future.

B

Because I'd love to really.

A

Understand the Boltzmann machine or hopfield network and I.

B

A

A

Wrote that down okay.

C

A

Mark says yes, so I'm waiting for a blurb of text, yes, comma anyway, what these end up with when you, when you get when the information is passed, all the way to the right is, you know for an input once this network is learned because the the model is in the weights between all of these nodes, and once that has learned, it has a representation of the of some some input space. Here.

A

You know that has been learned up over time in in all these connections, so that when you give it a new input that will classify it and a certain certain bits will activate as an output. You know, so the problem with catastrophic forgetting is: let's look at the the death, the Wikipedia definition and then expand upon that. So it's a tendency of an a an and like this to completely an abrupt ly. Forget previously learned information upon learning new information. So so we have to talk about learning.

B

A

In these networks, we do learning by the backpropagation.

A

Application of back propagation of error, you know- and you do, that in like batches right, you'll, you'll, you'll pass. You know basically, like average, a bunch of stuff, a bunch of input together and process them all at the same time and then run back propagation of error across the whole structure.

A

B

A

It's still working yeah so because.

B

A

Sort of learning in batches.

A

That means that your batches might have certain characteristics, so you might learn patterns in one batch and but but then, as you go to another batch, perhaps there's some subtle differences and in the data or the characteristics of the data and as you essentially apply the backdrop of a propagation of error across the network that you essentially wipe out knowledge of the past patterns and I'm sure that there's lots of as I say, knobs you can twist and settings that you can change and stuff to either make learning stronger or weaker.

A

You know that will affect this, but there's also techniques like here is the deepmind paper. I think this was a big paper about overcoming catastrophic, forgetting and neural networks from 2050, 1616 I. Think heyo 7:17.

A

So they're, basically laying out the problem here that neural networks aren't generally capable of learning tasks in a sequential fashion. So that's a good point, a good thing to point out right off the bat. The ability to learn tasks in a sequential fashion is crucial to the development of artificial intelligence. I would absolutely agree to that and that so I have to point out here. This is the big difference between HTM and and deep learning. Htm' always.

B

A

In sequential fashion, that's what it like does that's what evolution made it do so we're trying to reverse-engineer how we understand the sequence memory. The sequential part of this is done in the brain. I think that we have a good understanding of that. That's what HTM theory is all about. So right off the bat you can see that the eye hole, the whole problem of catastrophic forgetting, doesn't really quite apply to the biological idea of intelligence, as we define intelligence right.

A

So neural networks, however, are not capable of hand or have not been up until this point, and then this paper goes in to showing a way you can or an.

B

Algorithm that you can.

A

Apply to your deep learning networks called elastic weight, consolidation and I. Don't I, didn't I, don't know exactly what I don't understand the math here, but but it's a way that you can try and counter that catastrophic, forgetting happens naturally in deep learning networks so that you won't catastrophic. We forget something that was important in one of the previous batches, so there's already methods in place for deep learning networks so that you can prevent this from happening the- and this is probably the major paper you should read. If you want to learn about that.

A

There's a summary I think I wanted to hit on here so yeah. So basically, this paper is a novel algorithm to address that problem, specifically in deep learning networks.

A

So now, let's, let's talk about HTM, catastrophic, forgetting because this question came up on the forum seventeen days ago and someone asked is temporal memory prone to catastrophic forgetting so that's a good question. I, don't know why I didn't like it, but the.

B

Short answer is.

A

That there's no it's not I mean yeah. It can be affected by catastrophic, forgetting just like any other biological system, but I wouldn't call it catastrophic right. I mean people forget yeah, it could could be it. There is a bit of a terminology issue. Biological things forget all the time, but it's not a catastrophic thing. It's not like. You forgot everything. You learn the day previous.

A

Every new day, but I think that there's what Mark pointed pointed out here is that we don't call it out that, but the but the properties of sparse distributed representations are very resilient to what causes I. Think this catastrophic forgetting or at least the the idea that, when you learn something new and it's like something you learned in the past, you don't lose that you know you don't lose that that link, that association right, hey, Falco, so like it's, it's almost like when you think about sparse, distributed representations and the overlap between sparse distributed representations.

B

A

You can do this type of matching with representations. The way the brain represents information. You can do this type of matching so that, if something you learn something, and it's like something that you learn before you don't lose that. So you build on what you have learned. So there's this. The idea of catastrophic forgetting doesn't in the way that it's described in detail in deep learning doesn't apply. I. Think, as you accumulate knowledge over time, you know accumulate information about reality over time. You're not just going to catastrophic ly.

A

Forget that unless some mechanism of the distribution suddenly drops out, you know, and even then, as long as the representation is distributed, it may not even affect you even then very much, except that maybe a general performance degradation. Oh.

A

Really I didn't know that so I didn't realize. I was hosting another channel. That's a good point. Let me uh no smile. I was I previously yeah I am I'm hosting. Oh no I'm not read and host I'm.

B

A

B

I should have done.

A

It beforehand: okay, I appreciate that so I think I was hosting a channel. So whoever was watching that channel didn't get a notification that I was had started streaming again. So sorry with that.

A

So I was talking.

B

A

And so, as Mark points out here, there's hey, there's a couple of orthogonal associations to catastrophic forgetting I think when we talk in our papers about sparse distributed representations, particularly about unions and matching, which makes sense right and then who did you also say gee down here, section G, yeah, the Union property, because.

B

A

The way because we learned this semantic structure of the world were just less prone to catastrophic, forgetting.

A

So I've already read this but I like I like this discussion, because I think Paul sand. He has an experience like the catastrophic forgetting of it, but that, but there is certainly a capacity right for any use.

B

This for any given Network, which.

A

Is true, yeah mark says yeah we've got. We've got a sort of mathematical proof that our representation system itself is sort of resilient to catastrophic, forgetting so.

B

It's not it's yeah, but it's.

A

B

The way that the that.

A

The whole system builds up a model over representations over time and it's very different than that. I think the machine learning you think about this.

A

It's hard to compare these diagrams to like an HTM structure, because we have to talk about three dimensional three dimensional. Well, not really. We don't need to talk about three dimensions a bit. It makes sense to think about a layer of cells that aren't just sort of one-dimensional like these are like this. This being they call. This we've called this one layer, and this is one unit right of the layer and we would build a structure for the spatial Pooler to operate within. That's like so many cells tall.

A

That's how many cells would be in a mini column and then how many many columns are in the space and and then, if there's topology to the data, then you apply the topology to the structure of the cells to, and you have to create these topological neighborhoods and that's.

B

A

And that doesn't happen, I don't think in these. In these networks, there's you could say convolution attempts to do that and no and that it will take a picture and break it up, pick an image and break it up into parts and have dedicated layers with units that are processing features in each one of those and then you know filter them up through a bunch of convolving layers that try and capture those groups and groups of features.

A

But it's not the same. It's not the same thing as what we were talking about, and spatial pooling, although it I think you know, the idea of many columns, I think is, is maybe an originator. Well, I think this, the convolutional neural networks were came from the Whoville and ISIL stuff. Most likely does.

B

A

That deep learning is on learning and batches and HTM, as continuous learning has an influence. Yes yeah. Absolutely. The batch thing is super important, because I think I did talk about this a little before before he joined, probably that you have to apply the back propagation of error, algorithm sort of all at once across the whole network. So you you have to sort of stop the system and do this big, expensive calculation.

A

So you end up training on lots and lots and lots of spatial inputs not necessarily temporal at all, just just a whole bunch of spatial inputs that represent the same space right different, represent.

B

Yeah represent.

A

The same window to reality, whatever you want to call it, and there.

B

Isn't a sense of.

A

Time there it's not that the one of those images has any relation in time to any of the others, but you put them all in the same batch and you process them all at once and when you apply the back propagation of error algorithm, then you update all your weights for and they're sort of tuned to things in that data. That is trying to minimize error for classification, usually now that batch might have different characteristics than the next batch.

A

So when the next batch comes, maybe there's a subtle difference and the type of camera used or you know the types of lighting.

C

B

I don't know the.

A

Next batch comes and there's subtle differences in it, but you basically do the same thing you process all at once, and then you apply the back propagation of air algorithm, trying to mitigate the loss for these loss functions and that sort of tunes again all of the weights in the network to perform best for the epic of data that you just processed, which could remove the patterns from the last epoch and sort of overwrite them. Because the learning is newer and it's stronger I think that's a good definition of catastrophic forgetting.

B

Which I I think I described that better.

A

This time than I did the first time. So, thanks for asking for another explanation, does.

B

That match with what you think, because I know you've done some.

A

Machine learning in Falco.

A

Extract the central tendencies of the training data to learn the hyper planes for the training set you're talking. This is like you went a little.

B

Bit over my head, but I think.

A

I know what you're saying the central tendencies of the training data yeah yeah, so so your overfitting in a way right.

A

Could you call catastrophic forgetting sort of orthogonal to overfitting, because you could say that every time you apply the back propagation of error, you are overfitting to whatever data signature you're getting or sent that central tendencies of the data. You could say: you're, potentially overfitting for that and underfitting for whatever you've previously learned at that point, I don't know. Underfitting is a term but I think.

B

I know what you're saying yeah.

A

High dimensional fitting yeah it's hard to think about the higher dimensions of data, but yeah you have to think about what do you call hyper planes right, there's so many there's so many there's so many dimensions, yeah, that's the right! I think that's right way. To put it.

A

The new set is like fitting, well you're created you're, creating the new curves to that data right every time you apply a back propagation of error, algorithm across a machine-learning network, you're you're trying to mitigate or you're trying to get the best performance in your loss function, so you're trying to classify the best. Basically, it's because it's almost always some type of classification task and.

A

For each one of those loss functions which is so each trying to think of trying to think in like Bayesian terms here, I'm not very good at this I'm, not very good at the math behind this. So forgive me, but if I stick to the more high level topics, yeah I guess this. So that's a good way to put it, though, mark I think of it as learning a skeleton of the data.

A

What you're trying to get in one of these networks is you're trying to represent as much as possible about the data you're. Seeing in that network and you're.

B

Writing your loss functions.

A

In a way that you can optimize for the types of assumptions that you're the type of inferences you want to make on that data, but as the data changes over time, if the each data has its own sort of shape, signature or whatever, each batch of data can be characterized in some way, most likely differently than another sort of batch of data, even if there's all taken in the same way that you can't be guaranteed that when you give a machine learning network, a new set of data that it won't change what it already knows, as it applies learning for the new data.

A

That's the whole idea of catastrophic, forgetting I'm, still just sort of defining catastrophic, and the reason why it's not a big deal for biological systems is because we're continuously learning one of one of the reasons we're continuously learning. So there's not like every day we wake up or every night when you go to sleep. A back, propagate propagation of error. Algorithm applies through your brain and like there's an interesting process. I think that happens when you're asleep for sure. But I, don't think it's anything like that.

A

But but you can move on because every moment of every day.

B

A

Were you're learning right? It's it's in a sequential fashion, in the same way that you know I pointed out at the beginning of this paper we said the they said. The ability to learn tasks in a sequential fashion is crucial to the development of artificial intelligence. That's the type of sequential learning.

A

We do every day how we learn tasks in a sequential fashion, so it's like it's not necessarily applicable to the type of intelligence that we are trying to model with biologically inspired intelligence mark says it's equally important to define what it is that deep learning does learning a new. A new shape is at cross-purposes of what you learned in the old data.

A

Yeah I think that most people are doing I, don't.

B

A

It seems if I were doing a lot of machine learning. Stuff I would have a tendency to assume that whatever data I trained on I was going to get that basic type of data in there in the real world in the production, world and I know from experience after like living in the real world of data and big data streaming data that data changes over time. It changes over time and there's nothing. You can do about it. It's and it's not the data types or the stream definitions.

A

Necessarily it's the characteristics of the data and everything changes. Everything changes over time, so you're not going to have a general intelligence algorithm that can't handle changing over time. The world reality changing over time you have to with it. The intelligent system has to change with the world as it evolves, because it will continue to learning, has to be completely orthogonal or it will interfere, and it's almost never so that's yeah, that's a good way to put it yeah and- and you should be able to we do this- all the time, we're doing a turn.

A

Learning on and off on or off the whole, the part that, where your brains, learning is sort of, perhaps the simplest part, it's just allowing synaptic permanence Azure waits to be updated in response to correct predictions or incorrect predictions.

A

If you can easily just say just turn that off and and explain, intelligence.

B

A

That and but you have to but.

B

Learning has to exist for.

A

You to be intelligent for you to learn anything at all right, so.

B

Once you have learned something, and once.

A

Any of these intelligent systems that we create has learned to do something or to perform in some way to do something. You should be able to just turn learning off and let it.

B

Go do a thing and.

A

It'll never learn anything new and as long as the world doesn't change and perhaps for lots of robots or intelligent agents, systems or whatever they'll be deployed in worlds that never change perhaps, but so then they can go about their business and and never update their model the world and just continue to do what they do and that's all we need them to do, but for real living things we never turn learning off unless there's a biological problem and I think for most intelligent systems that we end up creating, we will not be turning learning off.

A

I think that that's like a key feature is being able to learn even after you've trained something to do something. You should still be able to learn to change its representation of reality. As reality changes in front of it.

A

Falgot says that data changes and the system should right does ring home. If you consider that the mammals brains are tuned for adaption, that's our edge over the Dinos, there is no context and defining networks to allow linear separability of the training sets yeah.

B

A

That's that's a very Matthew way to put it yeah exactly and the h of HTM will add context, yeah or or as well as lateral connections, the thousand brains idea. That also adds context. You don't have to have h, I, don't think I think you can do a lot without the H. Let me put it that way. You could do a lot without hierarchy with I think the lateral connections you there's.

C

More than we think you can.

A

Do locally locally? Yes, but it's a good point to point out a lot more is happening locally, I think than most neuroscientists and especially machine learning. Scientists think in a neural network and the.

B

Reader in a neural network in your brain anyway, there's a lot there's a lot of.

A

More smarts locally than it seems if, because we're just talking about these little cortical processing units right well, group them into neighborhoods, and they become much more than the sum of their parts, which is, which is a good way to put your brains much more than sum of its parts.

A

The lateral connections that we described in our papers and columns paper, the columns plus paper, are not part of the harpy.

A

The hierarchy would be the output of a cortical column like the the drum feed-forward output right and, and that and the same thing in column getting feed-forward input. It would be getting feed-forward input, not from sensor data but from somewhere else in the cortex. That's that's the hierarchy, part, the lateral stuff all happens within one layer wherever that cortical hub happens to be even if it's I don't want to say.

B

That, let's just say, wherever it happens, to be.

A

It's going to share a laterally through through those lateral connections with neighboring mini columns. No, they bring cortical columns.

A

B

A

B

A

I'm sorry I forgot to turn discord on I gotta.

C

Get a checklist.

A

A

Hopefully this works. If anybody wanted to join the chat. I know there's only a few of you online, but if anybody wants to join chat, voice chat while we're on here is the link and I just turned discord. Voice chat on so you can jump in there.

A

If you want to voice chat with me and anyone else who joins in we're talking about catastrophic, forgetting and I've, been talking about it for about a half an hour and looking for any does anyone have any direct experience with with it at all in the deep learning systems that they have created just curious because it's like I said it's not something. We worry too much about in the biological area and HTM that doesn't it doesn't affect HTM systems because of their natural, sequential nature of processing, input.

B

A

Okay, well, nice chat everybody I'm going to close the show I. Don't not just gonna drag it out for an hour if the topic is over, the topic is over so I appreciate you guys hanging out with me for a while talking about artificial intelligence, specifically deep learning and catastrophic, forgetting and weather. Catastrophic, forgetting applies to biologically inspired intelligent systems. I, don't think it does so take care and have a wonderful Monday. I might be streaming tomorrow. I'm writing a blog post about my experience with twitch for an event.

A

Accom and I might stream the writing of the blog post. We'll see why not.