Numenta Live Streams, 24 Dec 2019

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Let's Watch an AI Debate LIVE! Gary Marcus vs Yoshua Bengio 2019 University of Montreal

Description

Join Matt Taylor and his online community watch this historic debate live between Gary Marcus and Yoshua Bengio Dec 23rd. They voted on the best debater and discussed the content and generally had a good time watching the debate. See original Twitch video at https://www.twitch.tv/videos/525773366.

Original event details: https://www.eventbrite.ca/e/debate-yoshua-bengio-gary-marcus-live-streaming-tickets-81620778947#

A

You're watching us on YouTube, please follow the YouTube channel. This is the Gary Marcus, while they're gonna introduce us by the way something. This is our view of the Gary Marcus yoshua bengio, ban geo debate, 2019 university of montreal, so we're gonna, listen.

B

To the babe good evening from alignment.

A

Really, gentlemen, welcome today.

C

I debate I.

B

Am Vincent Boucher funding, chairman of Montreal a participant tonight, our professor Gary Marcus and professor Schwab NGO, professor Gary Marcus, is a scientist best-selling author and entrepreneur.

B

Professor Marcus has published extensively in neuroscience: genetics, linguistics, evolutionary psychology and artificial intelligence and is perhaps the youngest professor emeritus at NYU. He is the founder and CEO of her BCI and and the other of five books, including the algebraic mind his newest book, rebooting AI, building, building machines.

B

We can trust, aims to shake up the field of artificial intelligence and has been praised by Noam Chomsky, Steven, Pinker and Gary Kasparov Prasanna haben geo is a deep learning pioneer in 2018 Proserpine Zhu was a computer scientist who collected the largest number of new citations worldwide in 2019 he received jointly with Trenton Angelica the ACM Turing award, the Nobel Prize for computing.

B

He is the founder and scientific director of Miller, the largest university-based research group in deep learning in the world. His ultimate goal is to understand the principle that lead to intelligence through learning.

B

This diagram show the architecture of that two layer neural network. According to Trenton, you are relatively simple processing element that are very loosely models of neurons. They have connections coming in and each connection has a weight on it and that way it can be changed through learning, deep.

A

Learning is used.

B

Multiple layers of processing units to learn higher representations.

A

A long intro to just think that expecting.

B

A monolithic to to handle abstraction and reasoning is unrealistic.

A

Like doing the intro, so what happens? You believe that sequential reasoning can be a source.

B

Of while staying in a deep learning framework.

B

Or plan for the evening, oh.

A

No I'm not streaming an.

B

Opening statement later by Gary Marcus and by your shop and you follow it by a response, an interview with your shop in geo and Gary Marcus, then our guest. We take question from the audience. I had Miller, followed by a question from the international audience.

B

This AI debate is a Christmas gift from Monterrey I to the International AI community. The ashtag for tonight's event is AI debate. Montreal AI is grateful to Millar and into the collaborative Montreal AI ecosystem. That being said, we will start with the first segment from Sir Marcus. You have 20 minutes for your opening statement.

A

Marcus first first last they did a debate before these two did a debate before, but I didn't see it.

D

A

Of course, Navy doesn't work, hang on, it's live, they're gonna, be problems.

A

And I know that there's you're watching on twitch there's ads. You can get around those if you subscribe and also, if you follow they'd, be honest them too. It doesn't cost anything to follow and if you're watching on YouTube, please, like the video and subscribe to our Channel, put out a lot of content on YouTube, including demented research meetings.

B

E

We started yeah so and I were chatting about how a I was probably gonna come before AV, and he made some excellent points about his work on climate change and how, if we could solve the AV problem, it would actually be good thing for the world I.

B

E

Have made a humor.

A

Category Gary would have won that one I'm sure, okay.

F

E

You see my categories.

A

Over there, science.

E

Delivery, technical and rebuttal yeah sure I last week at neurons, neuro set a party having a good time. I hope we will have a good time tonight.

E

I, don't think either of us is out for blood, but rather for truth, an overview of what I'm gonna talk about today, I'm gonna start with a bit of history in a sense of where I'm coming from I'm gonna give my take on Yahshua's view, which I think is actually more agreements than disagreements, but I think the disagreements are important and we're here to talk about them and then my prescription for going forward.

E

The first part is about how I see AI, deep learning and current machine learning and how I got here it's a bit of a personal history of cognitive science and how it feeds into AI, and you might think of it as what's a nice cognitive scientist like me doing in a place like Mila, so here's an overview I won't go into all of it, but of some of the things that I've done. That I think are relevant to AI. An important point is I'm, not a machine learning person by training I'm.

E

Actually, a cognitive scientist by training.

A

E

Humans and how they generalize and learn and I'll tell you a little bit of about that work, going back to 1992 and and a little bit all the way up to the present. But first I will go back even a little bit before to a pair of famous books that people have called the PDP Bibles. Not everybody will even know what PDP is, but it's a kind of ancestor to.

A

Modern time Tommy, what these words showed.

E

One yahshua we'll be talking about many and the one I have on the right is a simplification of a network model. They tried to learn the English past tense, and this was part of a huge debate in these two books. I think the most provocative.

E

Was a paper about children's over regularization errors, so kids say things like braked and goad some of the time I have two kids. I can testify that this is true and it was long thought to be an iconic example of symbolic rules, so that you'd read any text book up to 1985 and it would say children learn rules. For example, they make these over regularization errors and what Rummel? How humble heart and McClelland.

A

We started the last debate to get.

E

A neural network to produce this output without having any rules in it at all, so this created a whole field that I would call eliminative connectionism, which was.

A

E

Any rules in it and its so-called great past tense debate was born from this and it was a huge war across the cognitive sciences by the time I got to graduate school. It was all that a lot of people wanted to talk about, on the one hand, up until that point until that paper, most of linguistics and cognitive science was couched in terms of rules. So the idea was you learn. Rules like a sentence is made of a noun phrase and a verb phrase.

E

So if you've ever read any of Chomsky, a lot of Chomsky's early work looked like that, and most AI was also all about rules, so expert system Chomsky.

E

We don't need rules at all forget about it. Even a error like breaks might in principle. They didn't prove it, but they showed him principal might be products of a neural network where you have the input on the bottom. The output on the top you tuned some connections over time might, in principle, give you generalizations that look like kids were doing.

E

On the other hand, they hadn't actually looked at the actual empirical data, so I trend told myself off to graduate school to work with Steve Pinker at MIT, and what I looked at was these errors?

E

I did I think the first big data analysis of language acquisition or one one of the first ones- writing shell scripts on on UNIX spark stations, and he looked at eleven and a half thousand child utterances and the argument the Pinker and I made was that neural nets weren't making the right predictions about generalization over time and particular verb, 's and so forth. If you care, there's a whole book that we wrote about it and what we argued for was a compromise.

E

We said it's not all rules like Morris Halle, who was on my thesis committee liked to argue- and we said it wasn't all neural networks like Rommel, Hart and McClellan didn't say it was a hybrid model. We said best captured the data, a rule for regulars.

E

This is why you say Singh saying this:.

A

Is exactly he takes these two books through.

E

These two approaches.

A

E

Always produce a strong response if you had a verb, it didn't sound like anything you'd heard before you'd fall back on the rule, so that was the first time that I argued for hybrid models back in the early nineteen nineties.

A

Around 1998.

E

Even a little bit before I started playing a lot with the network models. They've been a lot written about it, but I wanted to understand how they worked and so I started, implementing them, trying them out and I discovered something about them. That I thought was really interesting, which is people talked about them as if they learned the rule in the environment, but they didn't really always learn the rule, at least not in the sense that a human being might so.

E

Here's an example: if I taught you the function f of x, equals x or you can think of x, equals y plus y or different ways to think about it. This.

A

D

E

Same thing- and you do this on a bunch of cases- the neural network learn something, but it also makes some mistakes. So if you give it an odd number, which is what I have there at the bottom after giving it only even numbers, it doesn't come up with the answer that a human being would so I describe this in terms of something called a training space.

E

So let's say the yellow examples are the things that you've been trained on and the green ones are things that are nearby in space to the ones you've been trained on.

E

The neural networks generally did really well on the yellow ones, and not so well on the ones that were outside the space so near perfect at learning specific examples, good generalizing within a cloud of points around that and poor generalizing outside that space, I wrote it up in cognitive psychology after having some battles with reviewers, we could talk about some time later and the conclusion was that the class of eliminative connectionist models that is currently popular couldn't learn to extend universals outside the training space.

E

In my view, this is the thing that I am most proud of. Having worked on some details for later, this led me to some work on infants and what I tried to argue is that even infants could make these kinds of animals Asians that worse timing, the neural networks of that day. So it was a direct, deliberate task to the outside the training space generalization by human infants. So the infant's would hear sentences like Lottie, T and Ghana nah I read these to my son yesterday.

E

He thought they were hilarious, he's seven or almost seven, and then we tested on new vocabulary. So there would be sentences like whoa faith day or whoa whoa hey. So one of those has the same grammar as the kids. It seems like.

A

The easiest sounds to make are the ones that were more hardwired to recognize.

E

Early neural networks, the conclusion was infants, could generalize outside the training space, even where many neural networks could not and I argued. These should be characterized as learning algebraic rules. It's been replicated a bunch of times and it led to my first book, which was called the algebraic mind. The idea was humans can do this kind of abstraction. I argued that there were three key ingredients missing for multi-layer versus.

A

He's missing the movement through space.

D

A

Learning how to apply.

E

A

To space that changes, the state of the space.

F

D

E

Of attempts to use multi-layer perceptrons as models of the human mind, I wasn't really talking about AI I was talking about cognition. Such models. I argued simply can't capture the flexibility and power of everyday reasoning. The key components of the thing I was defending, which I would call symbol, manipulation, I, didn't invent it, but I tried to explicate it and and argue for it our variables instances bindings and operations over variables. So you can think in algebra. You have a variable like X.

E

You have an instance of it like two: you bind it so you say right now: x equals two or my now and phrase equals the boy, and then you have operations over variables, so you can add them together or you can put them together. Concatenation. If you know computer programming, you can compare them and so forth. Whoops. If.

A

You talked about that as movement through space.

F

It would make sense, but.

A

E

Not together these mechanism.

A

Operations or variables.

E

Do this all the time you have something like the factorial.

A

It's using the wrong verb, it.

E

Automatically generalizes to all instances of some class, let's say integer once you have that code, pretty much all of the world's software takes advantage of this fact and my argument from the baby-baby data was the human cognition appeared to do so as well innately, the subtitle of that first book. You can't see.

A

You surprised was.

E

Integrating connectionism and cognitive science I wasn't trying to knock down neural networks and say forget about it. I was saying: let's take the inside of those things, they're good at learning, but let's put it together with the insights of cognitive science, with a lot of which have been about using these symbols and so forth, and so I said, even if I'm right that symbol, manipulation plays an important role in mental life. That doesn't mean we shouldn't have other things in there too, like multi-layer perceptrons, which were the predecessors, it thinks.

A

Symbols, he should be saying objects.

E

Take a look at called neuro, symbolic, cognitive reasoning and I'm gonna. Try to suggest that it also anticipated some of yasha was current arguments. I stopped working on these issues. I started looking at innate Mis I learned to play guitar, that's a story for another day and didn't talk about these issues at all until 2012, when.

A

Realistically,.

E

Deep learning is only part of the larger challenge of building intelligent machines. Such techniques lack ways of causal relationship.

A

E

Logical inference, there's still a long way from integrating abstract knowledge and I once again argued for hybrid models, deep learning, as just one element in a very complicated set of machinery, then in 2018 people learn and got more and more popular, but I thought people were missing. Some important points about it and so I wrote a piece. I was actually here in Montreal when I wrote, it was called deep learning a critical appraisal. It outlined 10 problems for deep learning. I think was on the suggested readings for here.

A

Got a ton of failure.

E

To extrapolate face of training was really at the heart of all. You've got.

A

A lot of attention.

E

A

E

Felt like I was often there misrepresented to saying we should throw away deep learning, which is not what I was saying and I was careful in the paper to say in the conclusion. Despite all the problems, I have sketched, I, don't think we need to abandon deep learning, which is the best technique. We have for training neural networks right now, but rather we need to really conceptualize it not as a universal solvent, but simply is one tool among many.

E

So the central conclusions of my academic work included the value of hybrid models, the importance of extrapolation of compositionality acquiring and representing relationships, causality and so forth. Part 2 Yahshua's some thoughts on his views. How I think they've changed a bit over time a little bit about how I feel mr.

D

E

Thing I want to say I really admire Yahshua, for example, I wrote a piece recently skewering the field for hype and I said. But you know a really good talk is won by yoshua bengio, a model of being honest about limitations. I also love the work that he's doing, for example, on climate change and machine learning. I really think he should be a role model in his intellectual honesty. He.

D

Did not do this to.

E

Make the world a better place? My difference is with him are mostly about his earlier views. We first met here in Montreal five years ago and at that time I don't think we had much comment.

D

E

Much faith in black box deep learning systems he relied too heavily on larger data sets to yield answers and he'll talk about system, 1 and system 2 later I guess I will as well felt like. It was all in the system, one side and not so much on the system. Two side and I went back and talked to some friends about that. A lot of people remember talk.

E

He gave in 2015 to a bunch of linguists who didn't like Joshua's answers to questions like how would we deal with negation or quantification words like every and they felt like what Joshua mostly did was to say. Well, we just need more data and the network will figure it out and I yeah sure we're still in that position. I don't think he is I. Think we'd have a longer argument. Recently, however, Yahshua was taking a sharp turn towards many of the positions that I've long advocated for validating.

D

The fundamental.

E

Limits on deep learning need for hybrid models, the empirical importance of extrapolation and so forth. I have some slides and camera shots that I took at his recent talk at nur. Ups I think actually shows some really interesting convergence here. So.

A

E

Talk about my position right way to build hybrid models in a tennis. This is you know, significance of the fact that the brain is a neural network and what we mean by compositionality and that's it I think we actually agree about most of the rest, the first ones, the most delicate but I think occasionally, Yahshua has misrepresented me as saying look. Deep learning doesn't work, he said: that's the I Triple E spectrum, I, hope I persuaded you that that's not actually my position that I think deep learning is very useful.

E

I, don't think it solves all problems. The second thing is his recent work has really nailed. What I think is the most important point, which is the trouble deep Nets have in extrapolating beyond the data and why that means, for example, we might need hybrid models. I would like frankly, for him to cite me a little bit. I think not mentioning the D values, my contributions a little bit and further misrepresents my background in the field. What kind of hybrids should we seek? I think.

D

I'm very inspired.

E

By Daniel Kahneman's book about system, 1 and system, 2 I, imagine many people in the crowd have read it. You should, if you haven't, and that talks about one system, that's intuitive fast unconscious and other that's slow, logical, sequential unconscious. I actually think that's a lot like what I've been arguing for all along. We can have some interesting discussion about the differences. Their questions. Are they even different? Are they incompatible?

E

How could we tell I want to remind people of what I think is one of the most important distinctions drawn in cognitive science, which is by the late David Marr, who talked about having computational algorithmic? Let's.

A

Talk about David Marr a lot now, some.

E

Abstract algorithm or notion, like I'm gonna, do a sorting algorithm. You could predict a particular one like the bubble sort and then you could make it out of neurons. You could make it out of silicon. You could make it out of tinker toys I. Think we need to remember this as we have these conversations. So we want to understand the relation between how we're building something what algorithm is being represented. I, don't think that was actually.

A

The first person who showed me Marv, was.

E

Marcus Louis a strong claim that the system doesn't implement symbols and Joshua's been talking. A lot lately about attention. I think that what he's doing with attention reminds me actually of a microprocessor in the way that it pulls things out of a register, moves them into a register and so forth, and so in some ways it seems as if it behaves at least a lot like a mechanism for storing and retrieving values of variables from registers, which is really what I care have cared about.

E

Here something in an email he sent to a student, he said what you.

A

Brush past that slide, didn't he that didn't make any sense.

E

Researchers have tried for decades and failed. I have.

A

The vision stuff: that's what Marcus I think it's.

E

Misleading the reality is that hybrids are all around us. The.

A

Word symbol is totally a.

E

Hybrid between knowledge graph, which is classic symbolic knowledge and deep learning like.

A

Everything is an object.

D

Art, in my mind,.

A

Everything's, not a symbol.

B

Is about the idea, the symbolism? You have five minute.

E

B

Hammer to a hybrid.

E

Open a eyes, rubik's, cube solver is a hybrid there's, great work by Josh, Tenenbaum and Joon I. Think.

A

They're already hybrids, hybrid.

E

That just came out this year. Another argument the Joshua is given is that lots of knowledge isn't conveniently represented with rules. It's true, some of its not conveniently.

A

Now, objects in their relationships, relationships.

D

E

And I don't fully know. Yahshua's view is about nativism. So as a cognitive development person, I see a lot of evidence that a lot of things are built into the human brain, I think we are born to learn and we should think about it as nature and nurture, rather than nature versus, nurture and I. Think we should think about innate frameworks for things like understanding time and space and causality as Conte argued for in the critique of pure reason and Spell.

E

Ki has argued for in her cognitive development work and the argument that I've made in the paper on the left is that richer innate priors might help artificial intelligence. A lot machine learning has historically typically avoided. Nativism of this sort and as far as I can tell Joshua, is not a real fan of nativism and not totally sure.

E

Here's some empirical data showing that nativism in neural networks works. It comes from a great paper by Yann Laocoon in 1989, where he compared four different models and the one that had more innate, miss and the terms of a convolution prior for those who know that what that is were the ones that did better. This is just very quickly a picture of a baby, ibex climbing down a mountain I. Don't think anybody could reasonably say that there's nothing innate about the baby ifx.

E

It has to be born with an understanding of the three-dimensional world and how it interacts with it and so forth. In order to do the things that it does.

A

Not have to be born with the understanding of the three-dimensional world, it just has to be born with sensors and it has pre-wired to move through the three-dimensional world and the ability to have.

E

Sensory motor loop through in the cartoon version of the debate, yahshua wins by saying your brain is a neural network and everybody was wow. I guess Joshua was right after all, and Yahshua did at least half and Jess make a similar argument to me on Facebook when he said your brain is a neural net. All the way. Of course, deep neural networks aren't really much like brains. I've been arguing that for a while they're there, many cortical areas, many neuron types, many different proteins in different synapses and so forth, and so on.

E

I actually heard Yahshua would make essentially the same argument at nerves last week and so I think we probably pretty much agree about that. He made a beautiful argument about degrees of freedom, in particular they're loved, but the critical question is really: what kind of neural network is the brain? So going back to Mars distinction? You could build anything you want at any computation you want at. If tinker toys are out of neurons. We really want to know whether the brain is a symbolic thing at the algorithmic level or not, and then we ask well.

E

How is that implemented in in neurons? So simply knowing that the brain is a network made of neurons doesn't actually tell us that much. We really want to know what kind of network it is. There's another argument. People say well.

E

When my son learned long division last week and father.

A

He's covering way too much content.

E

They said well, symbols are the domain of conscious rule processing they're, just not what we do. Unconsciously Pinker and I said well language, isn't that conscious and we use symbols in in language. The real question is not whether the brain is a neural network. It's how much.

A

The point corrector for deep neural networks is compositional function.

E

Why exclude them from AI? We can't prove that they're inadequate, they have proven utility most of the world's computer code, is written in it and so forth and Watts importantly, lots of the world's distilled knowledge comes in the form of symbols. So you know everything in Wikipedia is and is symbolic we'd like to be able to use that in our machine learning systems, five compositionality yahshua has been talking a lot about compositionality and I. Think he will tonight I think he means something different than I mean by it. So I'll.

E

Let him give his description later, but I think it's partly about putting together different pieces of networks and so forth, I'm really interested in the linguist sense, which is how you put different parts of sentences together into larger holes. Here's a good example! Last week my friend Jeff Clun I've been encouraging him to come to UBC and encouraging UBC to hire him from a job, and my friend, Alan Mackworth said good news. Jeff Clun accepts and I wrote back and said awesome. He told me it was imminent, but swore me to secrecy.

B

We have till 2 second 30.

E

Seconds can I have 2 extra minutes.

A

B

Give you two moment to everyone.

E

From Montreal yields me two minutes, and so so I said yep Alan said: yep I knew that you knew, and eventually we get to everyone in this room now knows that Alan knew that Garry knew that Geoff was going to accept the job at UBC. I, don't think we can represent that it.

A

Is all over the place we.

E

Can barely get a system to represent the difference between eating rocks and eating apples, and this famous quote you can't cram the meaning of the entire effing sentence into a single vector, I think still stands. Compositionality is not just about language, so it's also about learning different concepts and putting them together in different ways. Here's my kids inventing a new game. Ten minutes later they've combined things that they know children can learn something in a few trials and we haven't figured out how to do that yet synthesis.

E

What I hope people will take away from this conclusions. The biggest takeaway from this debate should be about the.

D

Extensions of mind and.

E

Machine have converged. We agree that big data alone won't save us. We.

D

Obtained multi-layer.

E

Perceptrons on their own won't be the answer. We both think everybody's going forward should be working on the same things: compositionality reasoning, causality, hybrid models, extrapolation behind the training space, and we agree that we should be looking for systems that represent more degrees of neural freedom, respecting the complexity of the brain. At the same time, I hope to have convinced you that simple manipulation.

D

E

We've rejected it prematurely that hybrid neural, symbolic models are actually thriving and there's nothing more than prejudice. Holding us back from embracing more innate, miss the real action and compositionality is understanding.

A

So my votes on the board is in.

E

Terms of their parts, yeah I had a lot of waves of things that come and go in 2009. Deep learning was down and out a lot of people dismissed. It I have a friend who saw geoff hinton, give a talk when only one person came or poster. Excuse me.

B

E

Ben geo lacunae, Hinton kept plugging away despite resistance, I hope people doing symbols will keep plugging away. Here's my prediction in my last slide when Yahshua applies his formidable model building talents which I envy models that acknowledge and incorporate explicit operations over variables. Magic will start to happen. Thank you very much. Thank.

B

You for summing up this.

A

G

F

A

A

Got three pity points on delivery from that.

B

A

B

A

You could tell us what a symbol was and how it was represented and our brain if we can think of a symbol. Where is that in the brain?

A

What happens when you feel the thing or touch the thing? That's a symbol or associated to a real thing, or what even.

H

Isn't what human is a symbol.

A

D

A

On the object train, let's talk about objects.

A

Simply a symbol is.

C

A

Object you can represent as simple as an object. All symbols are objects, I think about it like that, then symbols, not that important. It may be his.

D

Symbol means an association between objects, then sure.

A

If you want to call it that, but all it is, but that's the representation that's important and the representation is in the object.

A

And it's Association is like where it's where and how it overlaps an experience with other things.

A

That's something the meaning. The semantics are in the objects they're, not necessarily in the symbols.

H

Of the associations between these debates.

H

Thanks Marcus for sending up and talking first so I I took a lot of notes. You called Marcus. So the main points.

H

I want to talk about out of distribution generalization, which is connected to some of the things that Marcus talked about, which I think is is more than the notion of extrapolation. I'll get back to that another.

A

Time is her: how are you my.

H

Views how deep learning might be extended to dealing with system to computational x' computational capabilities, rather than taking the old techniques and combining them with sister with neural nets? I want to talk briefly about attention mechanisms and why these may provide some of the key ingredients that Gary has been talking about that make symbolic processing able to do very interesting things, but how we can do it within a new on that framework and yeah and then turn.

A

H

Just turn our symbolic approaches so so I want to get out. I can't.

A

Do anything about.

H

When is a strong man, it tends to be used to mean ml apiece from 1989, just like Gary use the term just a few minutes ago, if you open the lasts Europe's proceedings, you'll see that it's much more than that. So deep learning is really not about a particular architecture or a particular even a particular training procedure. It's not about backdrop. It's not about covenants or an ends or MLPs. It's something! That's moving, it's more of a philosophy.

H

That's expanding, as we add more principles to our toolbox to understand how to build machines that are inspired by the brain in many ways and use some form of optimization, usually a single objective, but sometimes multiple objectives like in Ganz in general, there's a coordinated optimization of multiple parts and taking advantage of some of the early ideas from the 80s, of course, like this district representation, depth of representations, taking advantage of.

H

Transfer learning, learning to learn, and so on and I will argue, I think with the tools to move forward, include things like reasoning, search inference and causality and to connect to neuroscience, because Gary mentioned that there's actually a very rich set of work happening in the last few years connecting again the modern, deep learning research with neuroscience. We had a paper just published in Nature Neuroscience called deep learning framework for neuroscience, but I won't have time to talk about it today.

D

H

Means something different from the normal form organization, where you have data from one distribution, and we worry about generalizing, to example, from the same distribution. When we talk about extrapolation Gary, it's not clear whether we're talking about generalizing to new configurations coming from the same distribution. So you have to think about the notion of distribution to be able to make a difference for agents in the world.

H

This is very important because the what they see changes in nature because of interventions of agents because of moving in time and space and so on, and what I've been arguing for a little bit now suddenly much less than Gary is the importance of compositionality.

H

But one of the things I've done in the 2000s is try to help figure out. Why even can't neon nets and the ones from the 80s with presentations have a powerful form of compositionality and they're gonna go into details of that, but this dates to about five years old and similarly, why composing layers brings in the form of compositionality. So, basically my argument is we have these two forms already in your meds.

H

We can incorporate the form that Gary likes to talk about and I like to talk about these days, which is inspired a lot by the work of linguists, but but I think it's more powerful and more general than just about language and something we use in conscious reasoning, for example. It basically what.

F

H

Is how one may combine existing concepts in ways that may have zero.

F

Probability under.

H

The training distribution- not just it's, not just that it's a novel pattern- is that that the one that may be unlikely under the kinds of distribution area and yet our brain is able to come up with these interpretations. These novel combinations and so on and Europe's I gave this example of driving in a new city where you to be a little bit creative and combining the skills you know in in novel ways in order to solve a difficult navigation problem.

H

Now this issue is not new in deep learning, I mean in the sense that people have been thinking about, at least for a few years, and actually would say it's one of the hardest areas of deep learning and we haven't solved it, but I think people are starting to understand it better and one of the ingredients which I know others have been thinking is crucial in this exploration is attention.

H

So attention is interesting, interesting because change.

D

Is very nature.

H

Of what the standard neural Nets can do in many ways it creates dynamic connections that are created on the fly based on context, so.

F

H

More context, dependence, but in a way that can favour the sort of what Gary calls free generalization or something like this. That I think is important in in language and in conscious processing. So so why is that?

H

So attention selects an element from a set of elements in the lower layer, and it sends the selected element in a soft way, at least the soft attention kind of that we do in deep learning typically, and so the receiver gets a vector, but it doesn't know where that vector comes from, and so in order to really do a good job. It's important for the receiver to get information not only about the value that is being sent, but also where it comes from and the where is sort of a name.

H

Now it's not like a symbolic name. It's we use vectors. What we call keys and transformers, for example- and you can think of these as the neon that form of reference, because that information can be passed along, can be used again to match some element to some other element to form a firm for their attention operations.

H

So so, this also changes units from vector processing machines to set processing machines, which is something Gary talked about in his earlier interventions, in that I think that is important for for conscious processing. So I been talking a lot about consciousness in the last couple of years.

H

This is, of course, a much richer volume of research in cognitive neuroscience about consciousness and the way that I'm trying to look at this is how we can frame some of the things that have been discussed in cognitive science in neuroscience, about consciousness and about other aspects of high-level processing and framed them.

H

Different kinds of neural nets, so so one of these priors is, is what I call the consciousness primer, it's implemented by attention, which selects a few elements of an unconscious state into a into a smaller, conscious state and in terms of priors. What it means is that.

A

H

A

H

Would entail is that at this high level of representation, there's a sparser form of dependency structure, meaning that.

H

There are these dependencies which you can think of like a sentence like if I drop the ball, it will fall on the ground which relate only a few variables together. Now, of course, each concept like ball can be involved in many such sentences, and so there are many dependencies that can be attached to a particular concept, but each of these dependencies is itself sort of sparse involves few variables, so we can just represent that in in machine learning as a sparse graphical model as far as factor graph.

H

So that's one of the priors.

F

And the reason why it's such a probably.

H

Would be interesting is that it's something we desire for are the kinds of high level variables factors that we communicate with language. So there's a strong connection between these notions and language. The reason being that the things we do consciously we are able to report through language, whereas the things we don't do consciously that are going, you know below the level of consciousness. We can't report and presumably there's a good reason for this, because it's just too complex to be put in a few simple words.

H

But but what's interesting is that if we can put these kinds of priors on top of the highest level of representations of our neural Nets, then it will increase the chances of finding the same sorts of representations that that people use in language, so I call them semantic factors um another prior that I've been talking about, has to do with causality and changes in distribution because remember, I started this discussion by how do we change.

H

Then we have to add something else right. This is something fundamentally important in order to cope with changes in this fuchsia. Otherwise, the the nudist region could be anything right. So we have to make some sort of assumptions and I presume that evolutionists put these kind of assumptions in in human brains and probably animal brains as well, to make us better equipped to deal with those changes of distribution, and so what.

C

I'm proposing as.

H

A prior here and really inspired a lot by the work of people like shop, golf and Peter's and others in causality, is those changes, are the result of an intervention on one or a few high level variables which we could we can go causes so there's this prior that some many of the high level variables that I'm talking our causal variables. In other words, they can be calls or they could be effect of something or they related to how it caused changes.

H

It causes an effect, and the assumption here is that the change is localized right. It's not that everything changes when the distribution changes. If I close my eyes, it's.

A

Almost a thousand here watch 900 cases.

F

A

Only one already 3, just like variable.

H

Change is 3%.

A

H

Assumption in order to learn representations that are more robust to changes in distribution. This is what I talked about it when your representation and.

H

That says, better representations of knowledge have this property that, when the distribution changes very few of the parts of the model need to change in order to account for that change, and so they can adapt faster, they can have what's called smaller sample complexity, they need less data in order to adapt to the change.

H

Another thing that we have explored is related to modernization and systematic. Realization is the idea that we're going to dynamically recombine different pieces together different pieces of knowledge together in order to address a particular current input. So we have a recent paper called we current independent mechanisms, which is one first about that and I'm not gonna, go through the whole thing, but some of the main ideas is that we have a recurrent net. It's broken down into smaller, recurrent Nets, which you can think of different modules, which we call independent mechanisms.

H

They have separate parameters, they're not fully connected to each other, and so the number of free parameters is much less than n rec in the regular recurrent net. Instead, they communicate through a channel that uses a tension. We can such that they can basically only send these these. These named vectors, these key value pairs in a way this that makes it more plug-and-play that the same module can take as input the output coming from any module so long as they speak the right language that they fill the right slots.

H

If you want to think in a symbolic sense, but it's all vectors, and it's all trainable by backdrop and there's also a notion of sparsity, of which modules get selected in the spirit of the global workspace theory, which is which comes from cognitive neuroscience.

H

Alright, so let me list a couple: others I had not really time to mention.

H

H

Instances, it's not like there's a rule for my cat and my cat food there's a general rule that applies to cats and kept food in general right, and so we do these kinds of things. Of course, a lot in machine learning is, you know in in graphical models these date back to even the convolutional nets and dynamic Bayes nets which share parameters, and so so something like this needs to be there as well at the representation of the dependencies between the high level factors.

H

I mentioned the prior, that many of the factors of the high level need to be.

H

Causal variables and in the same spirit and I, didn't have time to talk about it because it's really a whole other talk, but very closely related to this subject agency. So we are agents we intervene in our environment. This is closely connected to the causality aspect and the high level variables. If you look at the ones we manipulate with language often have to do with agents, objects.

H

Papers already.

H

Look at intrinsic rewards and enforcement running. These are concepts that come very handy from.

B

Swab and you, you have five more minutes.

A

B

H

So then, there's this other prior already mentioned the idea that the changes in distribution arise from localized causal interventions and finally, one that is connected to this one. But it's different and it's been explored by my colleagues, for example. The only true and causal.

A

Intervention learning is.

H

The idea that some of the pieces of knowledge.

H

Correspond to different time scales, there are things about the world that change quickly. There are things that are very stable right, so there's, like general knowledge that you're gonna keep for the rest of her life and there are aspects of dual that can she change. We learn new faces. We learn new tricks.

H

In consciousness, related and potentially different from the symbolic program, well, you would like to build in some of functional advantages of a classical AI will be simple manipulation in your nest, but but in an implicit way. So we need efficient and coordinated large-scale learning. We need semantic routing in system 1 and the perception action loop. We need distribute representations for germination, which has been you know, big success for deep learning. We need efficient search that space on system 1. We need to handle uncertainty, but we want to operate these other things or because I.

A

Like to the idea of system, 1 search for systemization.

H

Factorizing knowledges in small exchangeable pieces, manipulating variables, instances, references in direction and.

A

Start justify a system since I.

H

Just taking the the mechanisms we know for good, old-fashioned, AI and applying them on say the top layer on the output is not sufficient. We need deep learning in the system to component as well as in the system. One part.

A

F

Said we needed a little confided in.

A

H

Distributor presentation to achieve normalization are.

A

H

To search in the space of reasoning and and then there's the question.

F

H

How symbols should be represented, my bad? We can get many of the attributes of symbols without the kind of explicit representation of them, which has been the hallmark of classical AI. So we can get categories, for example, by having multimodal representations of distributions. We can use things like Gumbel softmax, which encourage separation into different modes. We can get in directions variables, as I mentioned already. We can get recursion by recurrent processing and we can get a form of context. Independence coming.

A

At this from the row saya- and this is hard to dynamically.

H

Activate combinations of.

A

All right get some boats in there. If you missed this now, we're gonna have a.

B

Bottle kiss you have 7.5 minute 20.

A

So you could vote for at least the first section. If you have any points you still out there.

E

B

E

Think we disagree on all that much, except for your last set of slides, better.

D

E

If I can we're gonna get my slides back up, AV again after AI, so I didn't quite.

B

E

Your response to Google search in a way so I I tried it out Google, search as an example of a hybrid system that works in the real world. That scales. You know its massive.

H

F

E

Google search, yeah I know. Well, yes exactly so so your critique of the good old-fashioned.

D

E

System so I, let me just say: good old-fashioned, AI symbols all the way, I'm, not dorsum, that yeah.

A

It's not for simple.

E

I mean realize symbols plus deep learning, I take Google search to be an existence, proof for things that you just said couldn't exist. So you said good old-fashioned. Ai is not going to be able to represent probabilities. Well, it's not gonna.

H

Say I understand while you're talking about Google search I'm, not trying to emulate Google search I got intelligence.

E

Hang on just one second: can we just go in to back and forth instead of doing the seven and a half minutes and just be more freeform about it, so.

B

E

Great so so right you're not.

F

Trying to happen.

D

A search you're.

E

Trying to build an intelligent system about talks, Google search is in some ways an intelligent system and some dawn, but I think you have two avenues here. You can either say it's so different from an intelligent system that it's not interesting or you can say it's interesting and it does show the proof of concept.

H

Look I I completely agree that a lot of current systems, which use machine learning, also use a bunch of handcrafted rules and code.

H

This is, this is how state-of-the-art systems, in particular dialogue systems, I, think, is even more obvious example.

F

H

State-Of-The-Art systems combined machine learning and with a lot of a handcrafting, it's also true of autonomous vehicles. These days I mean there's, there's a lot of engineering on top of all the computer.

A

Vision so much there's.

H

No question: I, don't think we disagree on this. The question is: where do we go next in order to build something? That's closer to human intelligence? Okay,.

E

So you I may have misunderstood your argument. You're not saying I'm gonna recap to make sure I understand you're, not saying that one couldn't build hybrid systems. You were saying.

H

They already built that's.

E

What I was saying, but okay and I misunderstood your argument? No.

H

I'm talking about so how the brain works and ha I would like to build AI in the future.

E

Let's come back to the brain part, so so why are you not satisfied that hybrids are part of the answer? If I read you correctly, it.

H

Depends what you mean by the.

E

Word hybrid, so we at what point: do you get off the hybrid train so.

H

I get off the hybrid train when it's about taking the good old algorithms, like in production systems and and ontology, and rules and and logic which have a lot of value and I think can serve as an inspiration and trying to take them, basically glue them to neural nets. So people have been trying to do these kinds of things for time.

H

In the 90s there was, and I've tried to outline in my last couple of slides, I guess: I misunderstood that I had two more minutes left, but I had to outline the reasons why it couldn't it couldn't work, and it's not just about how the brain works. But you know for machine learning reasons for practical computational reasons, so so one of them is search so right now what I mean by search is what we do when we have the knowledge and say things.

A

Not wanting system to search now.

H

We can dynamically choose which parts go with which parts in order to come up with a new conclusion. This is what reasoning and planning are essentially about, and if.

A

You introspect.

H

A little bit about how.

F

H

Plan on how humans reason, we don't explore a zillion, different trajectories of possible ways of combining things and pick the one that works best according to some criterion.

A

Clear will clearly try.

H

One thing, and sometimes two and it really doesn't work- we try three or four go masters, go up to fifty okay, but like their their brain is weird because they've been trained or you know, people who are really good at algebra can so on, but but, like normal behavior involves this very intuitive sort of like we know where to search and that's based on system one, that's based on something that we don't have conscious access to. That knows where to search, and so so that's one reason why we we can't use the old algorithms.

H

The other reason is that the symbols themselves we know we need to resolve.

D

The solution, the.

H

Reason why connectionists.

H

It wasn't sufficiently rich kind of representation in order to get good realization. You want to represent everyday concepts like words in natural language, by sort of this sub symbolic representations that involve many attributes, and this this allows us to generalize across similar things and and I've read some of the things you wrote and you can say well, these attributes are like symbols themselves. Sure you can. You could do that, but the important point is now: you have to manipulate these, these rich representations, which could actually be fairly high dimensional.

H

We need to keep that from the new on that world and.

H

Yeah, of course, we need to keep the things that have worked well in machine learning, which certainty which some people are doing like Josh Tenenbaum with probabilistic programming and so on. So I think there are some efforts going in those directions, but we need to keep these ingredients together. I'm.

E

Gonna, mostly emphasize our agreements here, I agree. First of all, the classical symbol systems have search issues and I think that, to the extent that one wants to preserve them, one wants.

D

To solve, there are ways that people.

E

Have thought about it for.

D

D

E

Are micro theories to target reasoning in particular domains and that's I think an idea that's worth exploring, but I absolutely agree that if you have unbounded inference you're in trouble, I think that alphago is an example where you bound the search search, partly through a non symbolic system, and then you use symbolic system there as well, and so it's kind of a hybrid. In what way is it a symbolic.

H

E

Research is just you know,.

H

E

That actually brings me to a separate line of discussion, so.

H

I saw it too I think it's just a matter of words, so you know search it's where.

E

We need search.

H

And so kind of search, if you want to call that symbols I think symbols to me of a different nature. There's.

A

Not two systems.

A

Two different ways of searching quickly.

H

At the end of my presentation, we can get discreteness, not necessarily in its hardest form, in its purest form. As you have in symbols, you can get discreteness by having you know, that's lateral, inhibition that creates a competition such that the dynamics converge to one mode or another mode. This is what you observe in the brain. By the way when you take a decision, there's a sort of competition between different potential outcome.

F

H

And and so the dynamics chooses one sort of discrete choice over another, but it does it in a soft way for.

E

Instructions that's off to delay something else on. You I think that we both.

D

E

Think you're strong, mending symbols because lots of people have put probabilities and uncertainty into symbols and you think and I think it's an interesting discussion point that I'm straw Manning deep learning, so you said I'm attacking the models in the 1980s and there's some truth in it and then there's a question of what the scope should be so I think both for symbols and for neural networks. There's a kind of question about: what's the proper scope of them and then we're actually pushing to the same place from opposite sides.

E

So I would argue that the kind of deep learning stuff that was straight out of the 80s, which is you know, continued until like 2016 in my view. But we could argue about that. You know just let's have a big multi-layer perceptron, let's pile a lot of data in and hope for the best which I don't think you believe anymore.

E

But maybe you did at one point: that's one kind of deep learning, that's the kind of I, don't know prototype or canonical version of deep learning, and you want to open deep learning to a whole lot of other things and I. Think it's some level. That's fine! It's some level! I! Think it's changing the game. You might write them in a second I. Think that, with respect to symbols, you might feel I'm doing the same. So I want to say sure.

E

Symbols I want the discreteness of symbols, but I'm very happy to add in probabilities like in a probabilistic, stochastic grammar or something like that.

D

E

So I want to expand the umbrella of symbols and you want to expand the umbrella of deep learning. Why don't we say? Let's build deep learning, symbolic systems that expand the scope of deep learning and expand the scope of symbol systems. Look.

H

Iii, don't care about the words you want to use I'm just trying to build something that works, and that is gonna require a few simple principles to be understood and I do agree that there's a lot of interesting inspiration we can get today in in the work that's being done in kind of science in symbolic AI, but yeah.

I

H

Think that some of that needs to be reinvented and.

A

H

The way we started so.

A

He thinks the sandbox that needs to be reinvented, they both say each other's field, subfield of AI.

H

Needs to be rebooted beginning of this decade, so it's not actually attention organisms even date from much earlier.

G

H

So it's it's been around and another thing you have to keep in mind: I've been uh working on recurrent Nets was.

A

A birth I was Joshua Byrne. You.

H

H

Principles and again have been around for since the 90s, so it's not a completely new thing. There's.

A

A difference points for that we're.

H

Doing research, so it's not like we have one algorithm and we're stuck with it: we're building and constantly trying to expand on the set of principles that we have found to work. There's nothing wrong with that. There's.

E

Nothing wrong with it at all. I think we should actually yield from.

D

E

Than the public sure.

A

To questions this guy's.

B

Not gonna allow any funny business either, so there'll be no funny business.

A

During the questioning we'll follow the strict guidelines of the University of Montreal, so.

B

The first question is for professor Gary Marcus.

B

Stefan pinger said recently in a tweet that it's a NATO and not to so deep learning internet are, in fact shallow soaking up patterns but liking. Explanation, causality, robust reasoning and unique situation.

B

What is the innate knowledge for deep understanding and what need to be learned along the way Hugh? What.

E

Is there him what needs to be done.

B

E

B

The innate knowledge necessary to have deep understanding instead of deep learning representation. He wouldn't we like to have deep understanding that mean to be able to have causality reasoning and so on and consciousness.

F

E

D

E

It will bring something out. That's interesting, so I made a slide that I didn't show, which I made a slide. That I did not have time to show, which is both has a picture of a great new paper by Yahshua that we had on the reading list that is well worth reading. It is about causality and it's very mathematical paper I took what I think is some of the core math of it at the bottom. I admit that I didn't read the paper as carefully as I wish that I had but Yahshua.

C

Was going after causality.

E

By trying to make some clever observations about how distributions change over time relative to interventions that are made, which is, of course, the classic thing that we try to do when we run experiments and he's got I think some very clever ways of going after that within neural networks, and god bless him I think it's great work, it's not the work that I would do, but I think it's terrific I'm, just gonna draw a contrast. I'm.

H

Not sure I want to use God's blessing, but it's okay. Well,.

E

If you got that, you got that just at what I would say.

F

So on the right.

F

I have something for that.

E

Ernie Davis and I wrote: Ernie did almost all the hard work, but I helped a little bit that most people in the field right now would find to be repulsive, but that I think we need to think about very carefully.

E

Ernie created a logical formalism for understanding something very simple which is containers so I have water in this. If I tilt it, the water will fall out. What happens if I drop the microphone in it? Well, maybe not the electrical part of that, but.

D

This attainers are in a sigle.

E

Reasoning about, and the formalism that Ernie came up with that I think is responsive to your question is something that broke things into time-space manipulation, things about rigid objects in the histories of objects, so he did a very careful analysis of the knowledge that one needs in order to do this basic thing. But it's not a trivial thing, because we used container metaphors for a large fraction of the things that we talked about. I, don't want to say it's 50%, but it's significant.

E

So, for example, we can think of a container as a lake we can think of a cup is a lake we can think of as a container. We can think of the body as a container and so forth, and the argument.

C

A

Object order to.

E

Be able to make inferences about these things, you need prior knowledge and there's a question about whether that knowledge is innate or it's acquired experientially. But the argument is, you won't be able to make these inferences unless you have this knowledge about sets and object, containing regions and have these kind of axioms and those kinds of axioms things.

D

E

So one possibility here is: we need the formalism on the left in order to acquire the knowledge on the right another possibility is we never need the kind of knowledge on the right? It never needs to be reified in the way that Ernie Davis proposed.

D

E

That we should have people working on both sides of the spectrum. People often think they're in the minority I feel like I'm the minority, but we can do the sociology later I'd like to see more people working on stuff like this to build some broad frameworks for space-time causality.

A

And so for direct, but.

E

I totally welcome the kind of stuff that gosh was doing, even if I personally don't have the skills to do it and I think the empirical question it's kind of. Could you from the bottom up derive all this, although I feel like maybe I strum in yahshua I thought that he was more anti nativist than maybe he really is because he acknowledged evolution so I'll say one more sentence and then turn.

H

Knowledge evolution.

E

I mean well I'll, say one more sentence and then you can take it away so so, in my view, what part of the field should be doing is saying? Do we have any priors around things like this? This is kind of work that Liz spelke does in cognitive development and Renee, Byers on and so forth, and part.

D

Of the aside that.

E

Knowledge and part of the field should be like if we have that knowledge, and we know something about causality, how can we learn from that? So.

H

So let me personally young look at her for a minute. It's not that you know I and others had.

A

F

H

Similar thinking exam think that learning has to be from a blank slate. In fact, we have theorems from the 90s the the no finish theorem. That clearly says you can't have learning if you don't have some priors okay, but what we're saying is we're saying: is we'd like to be able to get away with as little prior as possible? Now, how is little measured? Well, you can think of measuring it in bits.

H

So, if you think about how big is a program that would encode those priors- and you know you would zip that program, that would be how big the prior is. So the kinds of priors have been talking about in my presentation. I was talking about priors, but these are players that in a way, are not going to require many bits, and so it's very eco.

A

The science yours are.

H

Now I also know full well that evolution has discovered very specific, stronger.

A

The same with our Constitution most.

H

Of it is about completely hard-coded behavior, but these are not the behavior that are most adaptive these another behavior that allow a species to adapt to. You know as as well as human as being have been able to do so. It's more interesting for me to think about the part of what evolution has discovered, that is more general. These are the most generic priors and, of course, we have priors that are very, very specific.

H

We kind of know how to see and to walk to some extent, and some and many animals have a lot more when they're born. So it's just.

E

A matter of what.

H

We care about here is going to squeeze that the prior knowledge.

G

H

This you know a few simple general principles as much as possible. We don't know what. Where is the right line, of course, so.

E

H

Your language, you.

E

Have a soft prior, which is that you want as little innate stuff as.

H

Well, as meta prior, which I want as little player as possible,.

E

Right so so this is a place where we at least disagree in taste, because I don't want a huge amount, but I think I want more than you. Of course we don't actually have a number, but let me give you my intuition.

H

Again, I wouldn't want to have to design the the semantics of each of the boxes in an AI system like this. If I could another.

F

Way with it, why.

H

Not because so yeah I didn't say why don't want to have as little prior as possible is because these leads to more general-purpose machinery that can be applied to a wider spectrum of behaviors environments, problems and so on. It's it's as simple as that. Well.

E

I guess I got two things to say there, but one is actually from yawns works, and since you mentioned it, we actually argued about this very thing. The other day, in this particular empirical case having more of a prior, was actually better right. So in this particular case having a convolutional prior made.

H

This general very small prior, it's like three lines of code of difference right: it's not a big change in the amount of information compared to the classical computer vision that was done before continent.

A

D

E

Brilliant lions, right yawn shared the Turing work with you for those three brilliant lines and, in essence, right work.

D

E

They were very clever, they've been very valuable to the world. Maybe you know I got 25 boxes up there. There are three lines each and we just need. You know. 24 more discoveries of that magnitude is the genome big enough to encode all those half. Sorry 95% of our genes are involved in brain development. I think there's room in there to encode know that many maybe 10 times more.

H

There's lots of room in the genome, but clearly not enough to encode the details of what your brain is doing. So it has to be that learning is explaining the vast majority of the actual computation done in the brain just by Counting arguments. That's.

A

Good, like 20,000 genes,.

H

With hundred billion neurons.

E

Was we only have so many genes? Let's say 20,000 genes, that we thought it was 30 when the book was written we have 86 billion neurons, and so what is the implication of that? So the genome shortage argument was well. We have to learn at all, but I think that the more.

H

Nuanced nobody said you have to know they know again you're doing the strawman thing.

E

H

Give you the nobody says that nobody in their right mind says that.

E

I mean it's partly a question about what our bid is. You know I want to have 20 things in this debate. The Ghana night.

A

H

Are we done the things you put on the board? I? Think we with most of them. In fact, most of them they are small priors. They don't require a lot of bits to be specified so I. So.

E

The Elizabeth I stood since I. Don't have the slide up. Those were things like spatial temporal continuity. Yes, you have them in confidence. Well, not the part about you could track an object over time know that it still exists.

H

Well, let's use a spatial.

E

Contiguity it's it's related I mean it's really translational invariance, but.

A

H

The other things I had on the words it's actually more than that because of the pooling but yeah the.

E

Other thing that I had on the list.

E

H

Totally comfortable with operations over variables, it's just that the meaning of operations and variables is different from me for you right so.

A

H

Are not just discreet that manipulate rich representations and I think of variables as indirection passing information about references about keys about the nature and types that can be rich rather than symbolic. But but besides I agree with the need for references. For example, I.

E

Mean in a way, that's all I really want so that you've made me a happy man. I'll tell you another time that you made me really happy er earlier tonight. You.

C

Talked about having.

E

Reference without knowing where it comes from and no.

H

I'm saying the reason you need reference.

A

H

Not just values, but also names, except those names are gonna, be vectors as well. You're.

E

More symbolic than I I I'm, finding it harder and harder to disagree with you we'll take another question.

A

B

In a spot, but with Yeshua and Gary.

B

Down Justin said many 99% of machine learning. Community is focused and wefts Jeff Cohen called the manual pad to AI I. Think.

C

A

B

Are manual but I feel.

A

Like I, disagree.

B

Which is we manually identify building blocks of AI, with the assumption that one day we were.

B

Do you think a higher fraction of our collective effort shall be relocated into the alternate part of AI, generating algorithm that Jeff proposed, wherein we simultaneously metal on the architecture metal on the learning algorithm self and we automatically generate the training environment themselves.

H

Well, I, like this question very much because I found this question in the early 90s with my brother Sammy, and it was essentially the subject of his thesis proposal. Other the thesis- and this was one of the first papers on meta learning. We were trying to better learn a learning, a synaptic learning rule. We didn't have enough computational power to do this and even now, I think in order to realize the kind of ambitious program that Jeff is talking about. We would need a lot more computational power.

H

That being said, I think it's a very interesting and important investigation and I was real amazed by the presentation, for example, that blaze' Garris gave at europe's on this subject. I think this is very exciting. That's.

A

A good mother personally.

H

I also tempted by the desire to understand the principles that would be discovered, and so, when I tried doing this.

A

H

Abstract it really helps a lot if you've been, if you put in a bit of the right structure, and in order to do that, you need to do experimentation of the kind. We do normally machinery, where you designed a learning algorithm completely and that helps to figure out what would be the the right building blocks and the right inputs and outputs that are needed for learning.

H

G

H

You know science is an exploration, we don't know what's going to work, these are two different directions and they can coexist in a harmonious way. I.

E

Pretty much agree with Yahshua's answer: it I'll answer it in a slightly different way. In principle, we know that evolution is a mechanism, that's powerful enough to evolve minds because it evolved our minds and having the machine do the work that sort of stands in for evolution would be great in practical matters. It does matter what you're trying to evolve and I think what has happened empirically in the evolution of neural networks.

E

Literature is that people start with too little in the way of priors, and so they end up recapitulating some of our journey to bacteria, but not so much of journey from say, chimpanzees to human beings. In principle, we know it can work in reality having a tightly constrained problem and probably a bit of priors to help us there might help it work even better than it is and I think it's totally worth exploring.

B

F

One is for you, you're.

B

Sure it's about the ethical, the ethics of conscious and reservation system, so there will be ethical implications about the conscious and Brissac and reasoning system. How do you approach that I.

H

Think it's important in general to ask the question of how our work as researchers will be used or could be used, because you know you don't need to go very far in the future. Today we already see the misuse of AI in many ways and I'm very concerned about how we are creating tools that can be destructive and endanger democracy and endanger human rights.

H

So now the specific question of consciousness, I think deserves a bit more time than this debate allows personally I think that the kind of conscious processing that Gary and I are talking about are adding more people.

A

Twitter's and the same stuff we're saying to intelligence.

H

Systems that we can build, but I, don't think it changes. Fundamentally the fact that we are building gradually more and more powerful systems. There's the question that some philosophers are asking about. You know whether we should eventually give personhood to intelligent, conscious machines. I, don't think we are anywhere close to understanding these questions enough to be able to answer these sort of. Why.

A

Bring it up, Joshua was.

F

That the question today.

A

Is the question.

B

Thank you. Your show for the next segment or participant will answer a question from the audience. Even Mila.

A

First, Ava's been bad, so we're not gonna get a good video. The questioner.

G

Thank you very much for this interesting debate, because artificial intelligence is going to solve a lot of problems that Mary that mattered very widely to many person.

A

Because the best questions to me and.

G

So my question I have several questions, but I'll limit it to two one. Is that Gary Marcus I don't know if your professor Marcus said something that your professor ben Gio's approach relied too heavily or his approach to deep learning and his belief in it relies too heavily on larger data sets to yield answers. So why is that necessarily bad? There are large data sets and ways of constructing them, and they said you want me to ask both questions. Let's.

E

Pause there and I'll address that first of all, I said that that was no.

A

Two questions Yahshua.

E

Several years ago, it's not my impression of Yahshua now I think that he's doing a lot of exciting work and he's right that some of it started a while ago, my impression when I first saw her and I friends in linguistics conference were when we would come to him and say yeah, but the kind of systems that we have right now they can't solve this and I felt like his answer was often when we get a big enough data set, we'll be able to cover that and I had some quotes from the slides, showing it I think there are many people that are more extreme about that.

E

Even Yahshua ever was so. There is a branch of machine learning where I think people think the answer to a particular problem is really about getting the right data set, I think Tesla's approach to driving those cars is more or less like.

D

Data science, very cool.

E

Ways- and they do trying to, for example, gather data about a particular kind of accident when it happens, it so forth. It's very focused on the data and not so much focused on certain kinds of innovations in algorithm space. That I would like to see so I have no objection to gathering more and more data. I think that getting clean data is really really when people often underestimate the.

D

E

Field was driven forward by having bigger databases, no problem with any of that, but the.

D

Answers aren't just there so in.

E

Yahshua's terms.

D

E

Working on system.

D

E

Some of that, and not just the system, one stuff plus bigger database.

A

Data label data.

H

So I think that I'm interested in the small daily regime in to the extent that we also have a lot of data before we get to that point. So humans learn a new task after they've seen a lot about the world right, you can't there's no chance that you will be able to learn in a meaningful way without a lot of that's.

A

Not true at all, you can't learn things very quickly. Humans.

H

H

Tasks very quickly, so.

D

H

One thing also more on the industrial side, if today.

F

H

Know I lead a company or a project, I'm gonna use as much data as I can, because this is the thing that works well, but at the same time, if you're looking further down the road in a few years and you're asking yourself what kind of improvement to our current algorithms would be most interesting for industry or for any kind of application, then looking at those transfer learning problems where you're looking at new tasks where you have little data, but you also have pre trained on many other things, that's more right now in the research, so the two things are not incompatible.

H

It depends on a bunch.

A

Of disparate models that you could throw away eventually, just.

D

A teeny bit of something.

E

That disagree with.

D

A

Bunch of batch models.

E

First, let me explain what is.

A

So ambitious this stuff he's talking about my.

E

Gosh we meant by that there are problems where people learn things with small amounts of data. Yoshua would say that's because they have a lot of experience elsewhere and that's often the case. In any case, the small data regime is like: how do you learn something? If you don't have 10 million data points? You know if you're, my kids and you learn a new game in five trials.

E

How do you do that and clearly some of it is you leverage prior experience, the only thing I'm gonna add there is the reason that I did disagreement is the reason I did that baby experiment back in 1999 was to show that there were some things that little kids could learn without much direct experience. So I made up the language, so they had no prior experience with the language that they did.

E

They I didn't say this, but the habituation the period where they learned the made-up language was only two minutes, so they only got something like 45 examples of this made-up language, so sentences like la titi and so forth,.

E

If you show the kids of a certain age can do X somebody else's yeah I've got even younger kids to do it. So somebody later showed that newborn smart.

A

What's a good example of one-shot learning extracting.

E

Rules and so even newborns, it's not a perfect experiment. There's a control missing, but there's pretty good evidence that even newborns could do this. So in this particular case, I think that what you have to draw on is not experience.

D

This is a good opportunity to give Gary science points an actual.

D

E

Data regime because we have priors for variables and things like that next question now: I can't call on my friends: that's not fair someone else.

I

Thank you for your presentations, dr. Marcus, you talked about the compositionality and the need to take into account the constitutionality as from a linguistic point of view, so we have debates and arguments on compositionality but to make a simple system. V accept the compositionality. We had some progress in the neural nets: the recursive neural nets for for compositionality. However, those efforts has been abandoned. They have abandoned the efforts on the recursive Nets. We don't do. Research anymore under recursiveness, carry.

A

For these papers is.

I

That the parse tree, they need the knowledge to feed into the recursive Network, to design the architecture and to form the network. I think there.

C

Is a resistance here.

I

That we learning community they are not willing to take any external knowledge in the form of linguistic structure or the parse trees. Dr. Ben Julie. Would you please elaborate.

H

On that I, don't think it's a resistance as much as an obsession to beating the benchmark, which could be good or bad all right. It's because these very large, fairly simple architectures have been working so well so I mean a good example. Now is the success of transformers transformers are working incredibly well but they're using actually these key value pairs I was talking about they're operating on sets.

H

So you know the the recursive nests was one attempt, but there's been others that have been more successful and and maybe recurse invest will come back. We don't know the history of science is very complicated, as we've seen with deep learning, so I think. There's a lot actually I, don't read the the sociology of the current deep learning field like you are in fact there's a lot of interest in exploring how we can put some architectural structure in your nets that facilitate the manipulation of language and reasoning.

H

So I'm, you know I'm much more optimistic than you seem to be I.

E

Would say that historically, there has been a resistance, I think that that's changing some I think it's partly a function of people have tools and they're good at particular things, and we don't really have quote deep learning tools, maybe in the extended sense for really dealing with well with recursion and compositionality, in the sense that I'm describing yet I think there's much more hunger in in that field in the last two or three years to do it in terms of the Transformers I just gave a talk on the new benchmark called dynamic understanding at nur ups, and you can probably google for it online.

E

The basic point I made about transformers like GPT two is they make very fluent speech, but they don't understand what's going on or fluent text, so I just have an example here from the slide, so they're often plausible, in the first few sentences of surrealist fiction, basically so I fed into one of the systems across the street from NURBS, two unicorns walk into a bar, and then the system says continues.

E

That passage with at least that's what my picture shows I've, never seen such a multicolored beautiful forest of sapphire eyes on the same corner of the street in a bar before it's like fabulous that has created this surface or realist prose. On the other hand, when I force it into the nonfiction genre, it seems a bit ridiculous. So so maybe we can know if this is gonna work or I'm gonna lose it an example.

E

I had on the right is two lemurs walk on a road, I was actually in a place with lemurs and roads recently and another joins in the total number of lemurs on the road is and you're supposed to add up two and one and come up with three and if you're, a human, you probably do that, but if you're deep learning system you might come up with something like not one hundred is claimed at about 80 or so, and so the system doesn't convert, and it goes back directly to your question.

E

The prediction in statistics that it's making about plausible classes of it's into a direct representation.

D

Understand what it's doing.

E

So I gave things about conventional knowledge, definitions, transformations, atypical consequences and so forth, and then I have data from these models on the right and they're. You know typically doing like 30% or 10%, or something like that. So there are sharp limits and I think those limits come because we don't have kind of a parse tree on the output yeah.

A

C

A

Psychologist: okay,.

C

Well, my name is mr. berry, since.

J

C

Years ago, and it's a virtual machine and all we could do with the binary mathematics, we achieve great things now today, where we can some active.

A

I'm curious, which.

C

Are closer from the way that the human brain really arrived right or wrong? Both almost at the same time are at the same time, well, kantam computing and quantum computers. Could they represent a breakthrough that we were waiting for to to achieve the artificial.

A

Intelligence want um the best question is gone quantum, maybe.

H

So but you know I'm a big fan of Occam's razor, so in.

A

Quantum going quantum is a description like this.

F

Is like bringing.

A

Up Hitler at an argument at Thanksgiving with the family, you.

H

Know I think it's very satisfying to go for simpler solutions and I. Think in terms of neuroscience, most of the community thinks that the brain can do is competition without requiring I? Dare.

A

Somebody in the sense that.

H

Molecules operating in a quantum way, but but you know if we abstract one level up, it's all computation and that that is not quantum it by Nature. So, of course we don't know what you know: I don't have a crystal ball at this point, I think the majority of the community, both in neuroscience and in computer science, are betting on traditional computing in the sense that it's not quantum, but another thing I want to say is right now how.

A

H

Paralyzed by no serious.

H

If they can find the right, theoretical breakthroughs that an able to implement things like you know, we that takes advantage of the quantum capabilities, then it would change the game. Make.

A

Quantum computers that would only run deep networks, then maybe that would how much totally agree.

D

Garry agrees, my friend, Sandi's gonna, ask a question.

J

I'm gonna start my question with a little anecdote when a bunch of journalists.

A

J

The scientists who created the nuclear bombs, one of the things that they profoundly stated was they were so involved in the science. They didn't even think of the ramifications so I'm listening to you, two geniuses here and I'm, not even gonna, pretend that, like three-quarters,.

F

J

Spacex going right over me, but one thing that disturbs me is I: don't hear a single word about checks and balances and ethics that are going into you're, creating the algorithms.

A

How far away we are Gary.

J

Has heard me say.

A

J

It out again, because I would love to hear you.

E

That accusation against I think Joshua is the least appropriate, because I think Joshua thinks pretty deeply about this and I'll. Let him speak about his version. The moment.

E

My own version in the book review, rebooting AI, was to argue. The common sense could be a way of building a framework such that machines could represent values, so you can think about Asimov's laws. You know you want robots to not do harm to people and.

D

E

About what harm would you.

D

Have to answer the question: I'm frustrated, you've.

E

Seen many other pictures of elephants.

A

Because this do.

E

The same trick can't.

D

Do this work yet.

A

E

A

Likely it's gonna be wasting a lot of the art we'd.

E

Better defines what it is, we've done. We need to rethink how we get knowledge into these systems in the nature of knowledge as a platform today,.

D

We get to define our.

A

E

That's how we thought about it, I, don't think it's a full answer that it is have we thought about it, and I will turn over to Yahshua.

H

Thanks for raising that question, it's very important, Gary and I have been talking about, maybe something a little bit technical from your point of view about where we think yeah research should be going in terms of I.

A

Forgot by Marlies medicines, but.

H

It's at least as important that our society invests even more on the question of how are we going to deploy these things? What is the responsibility of everyone and a chain from the researcher to the engineer, to the people doing auditing or to government's drafting regulations, to make sure that we.

H

C

H

A direction that's best for Humanity, that's best for citizens and I'm very concerned that we're building tools that are too powerful for our collective wisdom and I'm. Fine with like slowing down the deployment of AI I. Think governments are not yet ready to do the proper regulation and we need to spend more time.

A

That's really things.

H

Like how AI can be abused to influence people to control people to kill people, these are all very serious issues. Discrimination.

A

F

We don't care yeah, we care deep, there's.

H

Real humans, engineering things pretty.

A

Much all of us have the same morals and values when it comes to human life.

D

Maybe we will I'll.

E

Just keep one last exam.

H

Or one I want to mention that here in Montreal we we've been really working hard on this question and we came up last year after two years of work involving not just scholars but also citizens. With a thing we call the Montreal declaration for the responsible development of AI I, invite you to check it out online and we're pushing these ideas to the Canadian government.

H

I think it's a lot in the hands of governments and the agencies that are looking at specific sectors where, where this technology is being deployed, it's also in the hands of the UN. That's.

A

Why transparency founders wearing many people.

E

Involved on building things, I'll just add one thing, because I think we have to go to the online questions, but I wouldn't a amplify. The point about Wild West, a good way to think about this he's right now. A driverless car manufacturer can basically put anything on the road we can sue them after the fact if they cause great harm, but there's no regulations essentially about what you can do with a driverless card.

E

If you compare that to how much trouble the reason, oh that's more trouble than a medical test or build a new drug and how much regulations near there's an asymmetry, makes.

D

E

Lot of sense and I'll give a shout-out to my friend, Missy Cummings, who has a podcast I think with the zine I'm blanking on.

D

The spreads bizarre.

E

Or something like that exponential you or something like that a few weeks ago, talking about this issue, the asymmetry and regulation between, what's required for health and, what's required for AI I, think Joshua and I agree that there needs to be a lot more there questions yes,.

B

For the last segment or participant will answer a question from the internet audience.

A

An international question now.

B

A

Still voting cool and the voting system is not crashed.

E

B

What what we will do is that I will protect question from the audience on the screen.

B

A

B

The question from the internet international audience on.

E

This screen standby once again for AV.

B

I will give you a mouse to scroll and so on and the choose a question. The question.

A

Reminds me of somebody.

A

How's a good time to I, don't know what I'm sure there were ads that are interrupting this. Hopefully it wasn't too intrusive.

H

A

Let me let me just.

B

Put our YouTube.

A

B

Glued them and.

A

Chat, I'm gonna check out our YouTube channel I. Have this video and maybe the on lacunae and maybe I, don't know it's not.

D

E

Definition of symbols- I, don't think we should waste time, arguing about that. I think that, from the perspective, the symbol, manipulation. The real question is: do we have operations over variables? You can define a symbol in such a way that it encompasses everything or nothing and I. Don't think! That's where the debate should be well.

H

There's a question about what is the chance of.

A

Defined symbols: just then what does everything versus nothing.

H

It doesn't have any question, but it's also very loaded, because we all have our own ideas of what consciousness means. We think we have something special consciousness.

A

To consciousness, say it's.

H

Something fortunately, that scientists in neuroscience cognitive, science and machine learning are trying to are starting to think about, and it.

A

Just skipped over the hybrid question mystery: basically, because they can't cuz Gary can define what it symbolizes questions leader right. Isn't that what just happened so I'm gonna ask: how could they? How could they merge? You.

B

Can all go down and up.

A

Gary said that would take forever. I'd have to explain to you. You know how the universe.

B

B

A

Was Carl Sagan reference, a really bad one? This.

E

Is now there's AI a V in human interface.

H

A

Was the worst Carl Sagan reference I've ever given? Okay I know that now these.

H

Questions are too long. What is the best way to reproduce the levels of conscious and unconscious thinking in the AI? Well,.

D

That's you're arguing.

H

Actually about.

A

In call them contents going down at me near the end, researchers.

H

To explore different ways of doing this, it's.

D

Alerting things the.

E

Deep learning and symbolic AI are compatible and can provide the best of those worlds. Is there any evidence? I think that the best evidence that we have for that is we have some people building actual hybrid models in the real world to do useful things, none of them achieve human-level intelligence. You know deep learning system does that.

A

E

Hybrid system does their.

D

Presence systems like.

F

E

Us and they're very much hybrid systems, and then you have results like the Josh, Tenenbaum and so forth. Results that I showed briefly where at least in a very controlled environment,.

E

Or a symbolic system on a.

D

Sensory motor I, don't.

E

Think I would say.

D

E

We think the territory is that people need to explore I think the biggest take-home message as I said: I just actually agree a lot about what that geography is. That needs to be explored. We have some differences about where to go in that exploration. Neither of us think that we've reached the destination by any means, so.

H

I want to talk about the question about. Do you think language? Understanding, language understanding is a form of intelligence? We clearly need better language understanding for AI, and there are really interesting connections between language, understanding and reasoning, but they're really different. So the I listened to a presentation at the last Europe's by Federico and she's, a cognitive neuroscientist, and what she found with our colleagues is that there is a language area in the brain and it does process everything that's connected to language, but.

A

It's science, the.

H

Other things that one might think are related to language, like reasoning, so it's other area.

A

That's like acted to your mouth and that's also.

H

Connected to the bigger picture of language that.

A

It's associative part English has.

H

Syntactic aspects and structural aspects, but the semantics what language is referring to is like.

A

Ears into the language area, language.

H

Understanding refers to general knowledge of how the world works. This is an area which is very active in machine learning. People, irrespective of whether to do language or not, are looking at how learning systems which interact with their environment can build better models of the world, and if we don't do that, we'll never have good language understanding. So this connects with Gary talked about with limitations of.

A

His saying language comes after the understanding.

H

In order to process language benchmarks,.

A

H

I think one of the missing.

H

They don't quite a bit actually a world model through reading text, but there's a lot about the world which you can't get I think just from reading text. Maybe this is a place where Gary and I could disagree. I think there's.

A

A lot of knowledge.

H

About the world, which is intuitive, for example, intuitive physics that is difficult.

A

Words learn about.

H

A

An intuitive understanding.

H

A

Learn all that and.

H

That's connected to the system.

A

System they lived on Mars.

D

E

Things there I think the first thing that I will agree with Joshua trackers trackers.

D

E

Language and reasoning are clearly separate things, but they're not fully separate, so there's wonderful work, for example, from Mike Tanenhaus and John John throughs, well, showing experimentally the people reason about the world at the very moment where they're processing it. So if I give you an ambiguous sentence, you will look to what are the things out there in the world. They can help me to disambiguate the sentencing you will reason like.

E

Is there a cup on the table or a couple on the towel and I'm gonna put these all together in an understanding of a sentence, and so it's hard to draw a sharp line. As you know, interesting work notwithstanding, there's certainly overlap. On the other hand, a you know very clear example of how important all the physical reasoning stuff is would be any primate. That's not a human right think about all the physical reasoning that a chimpanzee can do without any language at all.

E

We could argue about the ape language studies, but I, don't think they're very compelling, so you have species that can you know, navigate their way through trees and have social interactions of all. You know very complicated social interactions exchange and all of these things without any language and I, think we would both be thrilled. If, before we leave this mortal coil, we were able to build AI systems that could do which imp and Z's do now. I have a personal interest in language, having studied it for most mice,.

A

I'll take mice, because all you have to do at that point is to make it bigger.

F

H

A lot of it is just copy it the right way. Let me talk about this path.

A

From mice to humans is much shorter than amoeba to Mouse, so.

H

If much shorter, being reasoning for a long time.

F

H

Well, before Neil Nets, you know became hot and considered.

F

H

Potential tools for reasoning, the way I think about how deep learning can do reasoning is connected to what I mentioned. How.

A

Many of us really.

H

We combine double pieces of knowledge that we can search through, so we can search through how you know which piece of knowledge can be combined with wispy piece of knowledge. In order to find a question find an answer to a question and those search are those searches are heavily guided by our intuition? So we know where to search.

A

So much culture comes with language passed down through generations. That's a lot of what the human is about. Solutions to.

H

A question that doesn't.

A

Come from a way.

H

Of thinking about reasoning which I find really reappearing, which dates back to the early eighties, you know network of geoff hinton, with both machines, where you can think of reasoning, and if you can find it again in modern graphical models, you can think of reasoning as finding a configuration of the random variables the variables that maybe provide answers to your questions that is most compatible with everything you already know right, which has the highest probability, given all the facts you're giving to the machine and with both machines, you're trying to find that through a Markov chain which searches.

C

H

C

H

It does, it is by looking for a low energy configuration that.

A

I grabbed mentioning Boltzmann machines until.

H

You find something that.

A

H

Of unconscious reasoning that we do, we all have the experience of asking ourselves a question when.

A

I say mark by the way, moving.

H

On to something else, and then maybe the day later, the morning after or something the answer comes to you. So the thing that has to happen during those hours is happening in the background and start.

F

To sleep sorry.

A

H

Don't you know it's harder to characterize, but it it? It may plausibly, be this kind of energy minimization. It.

A

Looks for the kind.

H

Of reasoning that we do consciously.

A

H

Through thousands and thousands of possible.

A

Today, later immediately.

H

Search through a few things that happen to be.

A

E

I think again, we agree on the two kinds of reasoning.

A

D

So to use deep learning and.

E

I call pure deep learning and I would say those can't do. Certainly what you call system. Two reasoning I would say that right now, the best system for do.

D

It's like little system.

E

D

We recall sound lighting cycles where.

E

A

One science point for that:.

D

E

Not straight text but put into computer interpreter will form, and you could argue about whether that's fair, but it's it's given Romeo and Juliet, and it can make some interesting inferences about characters, motivations and so forth. And that's a lot.

D

Of points are the richness.

E

Of inferences that can be made by symbolic systems is for now ahead of deep learning, but I will also grant y'all, but.

H

It doesn't work well.

E

I mean in a narrow domain, I mean actually in many narrow domains. It can work to some extent. I, certainly don't want to say that it's the answer, but it's a proof of concept that you can do and.

H

It's very I give you a second to come back, it's very unlike how the brain does it. Your brain doesn't go through. Zillions of you know, trajectories I will.

D

A

E

The contrast that I wanted to draw is so we have a system that doesn't really I think do reasoning at all, which is a pure, deep net, multi-layer perceptron, with none of the attention.

H

I disagree: I mentioned the both machines. They do just that. They're.

E

Not gonna be able to make reasoning over quantified statements and and so forth at least not that I've ever seen well,.

H

They haven't been explored recently, but this is essentially what they were designed for.

E

D

Public bed, afterwards, both machines know that's.

A

A true you got that if I.

D

Had more points that gives into yahshua I would.

E

Say by and large that the results of extant neural network unreasoning are not as impressive, even as that example from psych, but I would also say. I was gonna, give you the point, and then we can come come back that if you take the broader notion of deep learning that Yahshua would like to defend, and you start putting in mechanisms for attention and indirection and so forth, which come at least a little bit close to the things that I want. Then all bets are actually open. We don't know yet what the boundaries are.

E

Once you include mechanisms like indirection, we know some of the things we can do there, there's a lot of stuff in classical reasoning. I, don't think, has really been addressed. Yet there are other people that are more expert in that, but I would say even just dealing with quantified sentences. How do you deal with everybody loved somebody in the ambiguity and that we haven't really seen that so.

H

There's a question here about what do you think of transferring structured rules in the form of first order logic onto Network parameters, as opposed to encoding the information in latent variables? um So this is actually the kinds of things that people were trying to do in the 90s trying to create a direct analogy.

D

Transformation.

A

H

Between representations.

A

Transfer weights.

H

Between neurons and the rules in logic and personally I, don't think this can work for a number of reasons. On the other hand,.

A

Care transfer, learned rules and.

H

You know we we're already doing it. When is the old nets.

F

H

Can acquire knowledge by reading documents just like humans do or reading databases like knowledge bases? How.

A

In summary, this is like saying, of course, that transfer learning stuff is. We can verify because.

H

I, don't think we have the right tools.

A

H

Kinds of things that people are using now transformers I think can be evolved into what we need, especially if we better world lots so yeah. This is it's.

E

Clearly, still an open question, I think the most interesting work right now is done by or some of the most interesting work in that specific questions done by Archer Garr says who's, trying to build hybrid, hybrid models where you have explicit representation of logical formalisms.

E

Early days for that work, but it's interesting.

F

H

What would be the greatest challenge for ecology discovery without much prior in future?.

E

H

Think it's pretty much what we just discussed.

A

H

The first question is how to inject knowledge into.

A

Another form of the beta's framework.

H

Second is how to make sure that those initial injected assumptions will still hold after the training overcome catastrophic, forgetting so I think some other. So the first question I think we just at least I- gave partial answer to this. The second one about forgetting is is very important. It is connected to some of the things I was talking about when I mentioned.

A

H

To be adapted, but only a few parts of it, and if we able to do that, which for now we've done on a very small scale. But if you're able to do that on a large scale.

F

And talk about brands, people.

H

Everyone you're on in everyone.

A

Follow my channel.

E

Some of the option was fantastic students who are probably sitting in the room to study one question that they might not be studying so.

F

E

Right now that first question here how to inject knowledge into deep learning models and frameworks, there are a lot of people in the field. Thinking about this I, don't think there are a lot of people.

H

At the cyclope we had a paper a few years ago that does the things I was talking about earlier, in a sense that we have a language model, that here's.

A

H

It's reading text, for example, a person.

A

Surprises questions the reference I complete.

H

A sentence he's looking.

F

At a structured.

H

Knowledge page with you know like subject-object-verb things like this standard, relational databases and and looking for those words that it has seen or their equivalent synonym representations in the knowledge base and then using attention mechanism pick picking the pieces of knowledge in the knowledge base which can help it predict what the next word should be. So this was done with some jinan and it's been it's been published and what.

F

A

Questions models.

H

That can do their normal neural nettie thing, but as they're computing is like they're allowed to go online and check for information that they don't already know. That is not already integrated into their insight, brain and use that information in order to answer questions or predict something I'm.

E

Gonna raise a technical issue at make an advertisement, so the technical issue is I, wonder how well you don't have to answer it now, but how well it works with quantified statements and negation thinks it's.

A

Better announces what he's gonna do.

H

First, looking at that and I think it's only recently with attention mechanisms in the form that involves indirection, that you can start thinking about quantification, so quantification. The way I interpret it in a neuron. That sense is essentially that you have these little modules, which in your world, you would call rules and that's fine, except there are not symbolic rules they're, just more like they allowed to do inference on some variables given other variables, but they're. The inputs of those rules, don't have to be always the same.

H

Coming the same place, so the the inputs have types just like functions in programming in C++ have types, and so they expect their input to have the right type to match together and when a rule matches what is in the data it can be triggered just like in production systems.

E

So get to the advertisement.

H

D

E

The advertisement is rules-based.

D

System is going to.

E

Put together a set of readings after the debate, so people can follow up on some of the issues that we talked about today and my first nomination is the papers.

A

Now Mitchell agrees with me on Twitter great: what's the tweet I want to see it was.

F

It in response to the AI.

H

Debate if we are to build real will the AI systems? How feasible is the current practice of training deep there.

A

She is Melanie musholt and oh by the applied stress. Integration. I just shows a nice.

A

H

To anything is mostly a problem fixes.

A

H

A biological plausibility because there there's there's a long delay between information being propagated from one part of your brain, say in the back to the front part of your brain, so that the number of back-and-forth exchanges that can happen in the time that you are able to answer a question like half a second is very short.

H

It can go like back and forth a couple of times and so would be reasonable to assume that, although there is coordination at a global level, a lot of the learning involves local local defence, and so there's been a lot of interesting work in deep learning. I, don't think it right now, we've solved this problem where people are trying.

H

Innocence and if you look at reinforcement learning systems, they use that kind of trick as well to predict the reward that you will and use a predicted reward as an intermediate local reward. So I think there are some interesting questions about decentralizing. This kind of learning there's also more pragmatic, explorations.

A

And things like federated.

A

H

On local nodes, like on people's phones and things like this, without having the data on those phones necessarily travelling to some central repository, it's.

A

Pretty busy all.

H

Kinds of interesting questions there's also a lot of interesting connections to mostly so long emails.

A

A

H

With each other, each learning from its own objective function, but in a way there's a social thing going on where they're together trying to solve the problem. So I think that's another way that you can decentralize learning humans.

E

Do in society, of course, I think we're almost out of time. I'll, add one thing to that and I think Vince maybe goes next I think.

D

That modularity Jerry.

E

Fodor was talking about once upon a time.

D

For coming and having individual moments.

E

That are tuned to particular. Things is the heart of how the brain works. It's not fully modular but I. Think the most amazing thing about the imaging literature taking pictures scans of people's brains is the way in which the brain now in a connect to a phrase of yours, dynamically reconfigures itself, in the course of anything that we do so you can tell someone who's coming into a scanner experiment like the ones that Eva's gonna do. Okay, what you're gonna do now is you're gonna, take glasses and you're gonna put.

F

Them onto the head.

E

Every time you hear the word blue.

F

E

Then people in the space of three seconds talking.

E

D

Working with some of the dynamic.

E

Reconfigurability that they gotcha with.

H

Sagan, that is deep learning no.

E

Some time learning there I mean again. Why.

D

No, it isn't bloody. No it's! You say that deep.

E

Learning that allows you to do the reconfiguration yeah.

H

E

H

Learning with gates we've had gates for since 1989, or something like this gates.

E

H

Of course, creepy new idea, gating computation neuroscientist, have been talking about.

A

Your much laters.

C

H

No, so just just remove from your brain the idea that deep learning is in 1989 MLP with feat, Ward connection that is not deep learning. Sorry, we.

E

We can argue about the.

D

So I was with the agreement I.

E

Think the gates are the solution, so the question was framed in terms of end and deep learning and and then keep learning typically is the closest critiquing doesn't tip and only.

H

Not the state-of-the-art, not today's deep learning state-of-the-art, that's not how it is I'm.

E

All for the gates, what.

A

I'm, like I'm, totally triggered by several things, I, don't understand what they disagreed to did. They exist.

B

And gentlemen, we.

A

Just never both all about the gates, the gates as analysts. Many.

B

Times to Gary Marcus in your shop, injeel and Tamila for hosting us.

E

I want to thank Vince.

F

For the idea to do.

E

This and for Yahshua.

D

Boltzmann I don't see what networks.

A

They use deep learning, let's say all those deep learning that no, it.

B

Doesn't realize I feel the VI a new decade with soul would soon begin with the best way for heart, for you stay tuned for the announcement of follow-up event and for the unveiling of the next Montreal a world-class event good night. Everyone.

A

Thanks everybody for watching folk now, I mean it's pretty obvious. Everybody basically prefers Joshua I'm, going to turn my mic up now, I'm turning my mic back up to regular volume, yeah I thought this was fun like I said, I did I didn't know how this was going to go, but there's good amount of people watching us, 20s and 30s. You know that's good for one of my twitch streams. I'll get a YouTube video of this online as well, and thanks everybody for chatting and for participating and also for voting.

A

That was a lot of fun to set up, and you guys don't forget to follow, like you know, do all the social media stuff on on our YouTube official momenta YouTube and on my twitch channel here, which is what I originally streamed from that's what the 29 Watchers were.

A

They all watched live and all these chat chatting people over here all watched, live and voted, and everything obviously anyway, I'm going to shut up now take care of thanks for watching thanks, Gary Marcus next yoshua bengio thanks the University of Montreal for hosting the debate and for live streaming. It even though the AV wasn't perfect it's cool at school. Let's check the Twitter updates. Real, quick I would really love to have Mel Mitchell participate their future to be awesome, were you in the audience, because I would? No?

A

No, you said you're watching my stream just removed from your brain. The idea that the debate is now over.

A

Power of decentralizing learning service of privacy and in order to stimulate house how I don't think that was the same, a a debate, at least I wasn't watching that one. Just some people saying thanks and some comments, thanks everybody for for watching with me and hey! Maybe we'll do this again. Let me know somehow contact me somehow. No I am going to stop the stream. Now. Oh, no! We're gonna go raid. Okay, so.

C

This is a twitch.

A

Thing I've got so many people watching right now, I would it would be a it would be bad form of me if I didn't raid somebody so I don't know. Do you want to raid the stoner snowboarder or the more straight-laced game programmer guy?

A

Those are the two guys that seem to be on the online right now. I, don't know: let's go with this guy. Oh yeah also check if your, if you're new to HTM.

F

You don't know what HTM is: okay.

A

So I think this go.

C

A

You know if you're watching my channel on twitch right now, but we're going to switch to another channel and all the viewers in this channel are going to dump into another channel, and this guy is doing some type of game programming. It looks like alright, so I'm going to hit this button. You guys would say everybody's probably going to pop up something and say that you're going to raid and it's going to happen in like three seconds or something and then we're going to do it right now. Boom see you guys, thanks for watching.

A

And then we're going to.