►
From YouTube: Let's Watch an AI Debate LIVE! Gary Marcus vs Yoshua Bengio 2019 University of Montreal
Description
Join Matt Taylor and his online community watch this historic debate live between Gary Marcus and Yoshua Bengio Dec 23rd. They voted on the best debater and discussed the content and generally had a good time watching the debate. See original Twitch video at https://www.twitch.tv/videos/525773366.
Original event details: https://www.eventbrite.ca/e/debate-yoshua-bengio-gary-marcus-live-streaming-tickets-81620778947#
A
B
Professor
Marcus
has
published
extensively
in
neuroscience:
genetics,
linguistics,
evolutionary
psychology
and
artificial
intelligence
and
is
perhaps
the
youngest
professor
emeritus
at
NYU.
He
is
the
founder
and
CEO
of
her
BCI
and
and
the
other
of
five
books,
including
the
algebraic
mind
his
newest
book,
rebooting
AI,
building,
building
machines.
B
B
This
diagram
show
the
architecture
of
that
two
layer
neural
network.
According
to
Trenton,
you
are
relatively
simple
processing
element
that
are
very
loosely
models
of
neurons.
They
have
connections
coming
in
and
each
connection
has
a
weight
on
it
and
that
way
it
can
be
changed
through
learning,
deep.
B
B
This
AI
debate
is
a
Christmas
gift
from
Monterrey
I
to
the
International
AI
community.
The
ashtag
for
tonight's
event
is
AI
debate.
Montreal
AI
is
grateful
to
Millar
and
into
the
collaborative
Montreal
AI
ecosystem.
That
being
said,
we
will
start
with
the
first
segment
from
Sir
Marcus.
You
have
20
minutes
for
your
opening
statement.
A
D
A
And
I
know
that
there's
you're
watching
on
twitch
there's
ads.
You
can
get
around
those
if
you
subscribe
and
also,
if
you
follow
they'd,
be
honest
them
too.
It
doesn't
cost
anything
to
follow
and
if
you're
watching
on
YouTube,
please,
like
the
video
and
subscribe
to
our
Channel,
put
out
a
lot
of
content
on
YouTube,
including
demented
research
meetings.
B
B
F
E
E
The
first
part
is
about
how
I
see
AI,
deep
learning
and
current
machine
learning
and
how
I
got
here
it's
a
bit
of
a
personal
history
of
cognitive
science
and
how
it
feeds
into
AI,
and
you
might
think
of
it
as
what's
a
nice
cognitive
scientist
like
me
doing
in
a
place
like
Mila,
so
here's
an
overview
I
won't
go
into
all
of
it,
but
of
some
of
the
things
that
I've
done.
That
I
think
are
relevant
to
AI.
An
important
point
is
I'm,
not
a
machine
learning
person
by
training
I'm.
E
Humans
and
how
they
generalize
and
learn
and
I'll
tell
you
a
little
bit
of
about
that
work,
going
back
to
1992
and
and
a
little
bit
all
the
way
up
to
the
present.
But
first
I
will
go
back
even
a
little
bit
before
to
a
pair
of
famous
books
that
people
have
called
the
PDP
Bibles.
Not
everybody
will
even
know
what
PDP
is,
but
it's
a
kind
of
ancestor
to.
E
E
Was
a
paper
about
children's
over
regularization
errors,
so
kids
say
things
like
braked
and
goad
some
of
the
time
I
have
two
kids.
I
can
testify
that
this
is
true
and
it
was
long
thought
to
be
an
iconic
example
of
symbolic
rules,
so
that
you'd
read
any
text
book
up
to
1985
and
it
would
say
children
learn
rules.
For
example,
they
make
these
over
regularization
errors
and
what
Rummel?
How
humble
heart
and
McClelland.
A
E
Any
rules
in
it
and
its
so-called
great
past
tense
debate
was
born
from
this
and
it
was
a
huge
war
across
the
cognitive
sciences
by
the
time
I
got
to
graduate
school.
It
was
all
that
a
lot
of
people
wanted
to
talk
about,
on
the
one
hand,
up
until
that
point
until
that
paper,
most
of
linguistics
and
cognitive
science
was
couched
in
terms
of
rules.
So
the
idea
was
you
learn.
Rules
like
a
sentence
is
made
of
a
noun
phrase
and
a
verb
phrase.
E
We
don't
need
rules
at
all
forget
about
it.
Even
a
error
like
breaks
might
in
principle.
They
didn't
prove
it,
but
they
showed
him
principal
might
be
products
of
a
neural
network
where
you
have
the
input
on
the
bottom.
The
output
on
the
top
you
tuned
some
connections
over
time
might,
in
principle,
give
you
generalizations
that
look
like
kids
were
doing.
E
I
did
I
think
the
first
big
data
analysis
of
language
acquisition
or
one
one
of
the
first
ones-
writing
shell
scripts
on
on
UNIX
spark
stations,
and
he
looked
at
eleven
and
a
half
thousand
child
utterances
and
the
argument
the
Pinker
and
I
made
was
that
neural
nets
weren't
making
the
right
predictions
about
generalization
over
time
and
particular
verb,
's
and
so
forth.
If
you
care,
there's
a
whole
book
that
we
wrote
about
it
and
what
we
argued
for
was
a
compromise.
E
A
E
Even
a
little
bit
before
I
started
playing
a
lot
with
the
network
models.
They've
been
a
lot
written
about
it,
but
I
wanted
to
understand
how
they
worked
and
so
I
started,
implementing
them,
trying
them
out
and
I
discovered
something
about
them.
That
I
thought
was
really
interesting,
which
is
people
talked
about
them
as
if
they
learned
the
rule
in
the
environment,
but
they
didn't
really
always
learn
the
rule,
at
least
not
in
the
sense
that
a
human
being
might
so.
E
A
D
E
Same
thing-
and
you
do
this
on
a
bunch
of
cases-
the
neural
network
learn
something,
but
it
also
makes
some
mistakes.
So
if
you
give
it
an
odd
number,
which
is
what
I
have
there
at
the
bottom
after
giving
it
only
even
numbers,
it
doesn't
come
up
with
the
answer
that
a
human
being
would
so
I
describe
this
in
terms
of
something
called
a
training
space.
E
In
my
view,
this
is
the
thing
that
I
am
most
proud
of.
Having
worked
on
some
details
for
later,
this
led
me
to
some
work
on
infants
and
what
I
tried
to
argue
is
that
even
infants
could
make
these
kinds
of
animals
Asians
that
worse
timing,
the
neural
networks
of
that
day.
So
it
was
a
direct,
deliberate
task
to
the
outside
the
training
space
generalization
by
human
infants.
So
the
infant's
would
hear
sentences
like
Lottie,
T
and
Ghana
nah
I
read
these
to
my
son
yesterday.
E
E
Early
neural
networks,
the
conclusion
was
infants,
could
generalize
outside
the
training
space,
even
where
many
neural
networks
could
not
and
I
argued.
These
should
be
characterized
as
learning
algebraic
rules.
It's
been
replicated
a
bunch
of
times
and
it
led
to
my
first
book,
which
was
called
the
algebraic
mind.
The
idea
was
humans
can
do
this
kind
of
abstraction.
I
argued
that
there
were
three
key
ingredients
missing
for
multi-layer
versus.
F
D
E
Of
attempts
to
use
multi-layer
perceptrons
as
models
of
the
human
mind,
I
wasn't
really
talking
about
AI
I
was
talking
about
cognition.
Such
models.
I
argued
simply
can't
capture
the
flexibility
and
power
of
everyday
reasoning.
The
key
components
of
the
thing
I
was
defending,
which
I
would
call
symbol,
manipulation,
I,
didn't
invent
it,
but
I
tried
to
explicate
it
and
and
argue
for
it
our
variables
instances
bindings
and
operations
over
variables.
So
you
can
think
in
algebra.
You
have
a
variable
like
X.
E
You
have
an
instance
of
it
like
two:
you
bind
it
so
you
say
right
now:
x
equals
two
or
my
now
and
phrase
equals
the
boy,
and
then
you
have
operations
over
variables,
so
you
can
add
them
together
or
you
can
put
them
together.
Concatenation.
If
you
know
computer
programming,
you
can
compare
them
and
so
forth.
Whoops.
If.
A
E
Automatically
generalizes
to
all
instances
of
some
class,
let's
say
integer
once
you
have
that
code,
pretty
much
all
of
the
world's
software
takes
advantage
of
this
fact
and
my
argument
from
the
baby-baby
data
was
the
human
cognition
appeared
to
do
so
as
well
innately,
the
subtitle
of
that
first
book.
You
can't
see.
E
Integrating
connectionism
and
cognitive
science
I
wasn't
trying
to
knock
down
neural
networks
and
say
forget
about
it.
I
was
saying:
let's
take
the
inside
of
those
things,
they're
good
at
learning,
but
let's
put
it
together
with
the
insights
of
cognitive
science,
with
a
lot
of
which
have
been
about
using
these
symbols
and
so
forth,
and
so
I
said,
even
if
I'm
right
that
symbol,
manipulation
plays
an
important
role
in
mental
life.
That
doesn't
mean
we
shouldn't
have
other
things
in
there
too,
like
multi-layer
perceptrons,
which
were
the
predecessors,
it
thinks.
E
Take
a
look
at
called
neuro,
symbolic,
cognitive
reasoning
and
I'm
gonna.
Try
to
suggest
that
it
also
anticipated
some
of
yasha
was
current
arguments.
I
stopped
working
on
these
issues.
I
started
looking
at
innate
Mis
I
learned
to
play
guitar,
that's
a
story
for
another
day
and
didn't
talk
about
these
issues
at
all
until
2012,
when.
E
E
Logical
inference,
there's
still
a
long
way
from
integrating
abstract
knowledge
and
I
once
again
argued
for
hybrid
models,
deep
learning,
as
just
one
element
in
a
very
complicated
set
of
machinery,
then
in
2018
people
learn
and
got
more
and
more
popular,
but
I
thought
people
were
missing.
Some
important
points
about
it
and
so
I
wrote
a
piece.
I
was
actually
here
in
Montreal
when
I
wrote,
it
was
called
deep
learning
a
critical
appraisal.
It
outlined
10
problems
for
deep
learning.
I
think
was
on
the
suggested
readings
for
here.
A
E
Felt
like
I
was
often
there
misrepresented
to
saying
we
should
throw
away
deep
learning,
which
is
not
what
I
was
saying
and
I
was
careful
in
the
paper
to
say
in
the
conclusion.
Despite
all
the
problems,
I
have
sketched,
I,
don't
think
we
need
to
abandon
deep
learning,
which
is
the
best
technique.
We
have
for
training
neural
networks
right
now,
but
rather
we
need
to
really
conceptualize
it
not
as
a
universal
solvent,
but
simply
is
one
tool
among
many.
E
So
the
central
conclusions
of
my
academic
work
included
the
value
of
hybrid
models,
the
importance
of
extrapolation
of
compositionality
acquiring
and
representing
relationships,
causality
and
so
forth.
Part
2
Yahshua's
some
thoughts
on
his
views.
How
I
think
they've
changed
a
bit
over
time
a
little
bit
about
how
I
feel
mr.
D
E
Thing
I
want
to
say
I
really
admire
Yahshua,
for
example,
I
wrote
a
piece
recently
skewering
the
field
for
hype
and
I
said.
But
you
know
a
really
good
talk
is
won
by
yoshua
bengio,
a
model
of
being
honest
about
limitations.
I
also
love
the
work
that
he's
doing,
for
example,
on
climate
change
and
machine
learning.
I
really
think
he
should
be
a
role
model
in
his
intellectual
honesty.
He.
E
E
Much
faith
in
black
box
deep
learning
systems
he
relied
too
heavily
on
larger
data
sets
to
yield
answers
and
he'll
talk
about
system,
1
and
system
2
later
I
guess
I
will
as
well
felt
like.
It
was
all
in
the
system,
one
side
and
not
so
much
on
the
system.
Two
side
and
I
went
back
and
talked
to
some
friends
about
that.
A
lot
of
people
remember
talk.
E
He
gave
in
2015
to
a
bunch
of
linguists
who
didn't
like
Joshua's
answers
to
questions
like
how
would
we
deal
with
negation
or
quantification
words
like
every
and
they
felt
like
what
Joshua
mostly
did
was
to
say.
Well,
we
just
need
more
data
and
the
network
will
figure
it
out
and
I
yeah
sure
we're
still
in
that
position.
I
don't
think
he
is
I.
Think
we'd
have
a
longer
argument.
Recently,
however,
Yahshua
was
taking
a
sharp
turn
towards
many
of
the
positions
that
I've
long
advocated
for
validating.
E
A
E
Talk
about
my
position
right
way
to
build
hybrid
models
in
a
tennis.
This
is
you
know,
significance
of
the
fact
that
the
brain
is
a
neural
network
and
what
we
mean
by
compositionality
and
that's
it
I
think
we
actually
agree
about
most
of
the
rest,
the
first
ones,
the
most
delicate
but
I
think
occasionally,
Yahshua
has
misrepresented
me
as
saying
look.
Deep
learning
doesn't
work,
he
said:
that's
the
I
Triple
E
spectrum,
I,
hope
I
persuaded
you
that
that's
not
actually
my
position
that
I
think
deep
learning
is
very
useful.
E
I,
don't
think
it
solves
all
problems.
The
second
thing
is
his
recent
work
has
really
nailed.
What
I
think
is
the
most
important
point,
which
is
the
trouble
deep
Nets
have
in
extrapolating
beyond
the
data
and
why
that
means,
for
example,
we
might
need
hybrid
models.
I
would
like
frankly,
for
him
to
cite
me
a
little
bit.
I
think
not
mentioning
the
D
values,
my
contributions
a
little
bit
and
further
misrepresents
my
background
in
the
field.
What
kind
of
hybrids
should
we
seek?
I
think.
E
By
Daniel
Kahneman's
book
about
system,
1
and
system,
2
I,
imagine
many
people
in
the
crowd
have
read
it.
You
should,
if
you
haven't,
and
that
talks
about
one
system,
that's
intuitive
fast
unconscious
and
other
that's
slow,
logical,
sequential
unconscious.
I
actually
think
that's
a
lot
like
what
I've
been
arguing
for
all
along.
We
can
have
some
interesting
discussion
about
the
differences.
Their
questions.
Are
they
even
different?
Are
they
incompatible?
E
E
Abstract
algorithm
or
notion,
like
I'm
gonna,
do
a
sorting
algorithm.
You
could
predict
a
particular
one
like
the
bubble
sort
and
then
you
could
make
it
out
of
neurons.
You
could
make
it
out
of
silicon.
You
could
make
it
out
of
tinker
toys
I.
Think
we
need
to
remember
this
as
we
have
these
conversations.
So
we
want
to
understand
the
relation
between
how
we're
building
something
what
algorithm
is
being
represented.
I,
don't
think
that
was
actually.
E
Marcus
Louis
a
strong
claim
that
the
system
doesn't
implement
symbols
and
Joshua's
been
talking.
A
lot
lately
about
attention.
I
think
that
what
he's
doing
with
attention
reminds
me
actually
of
a
microprocessor
in
the
way
that
it
pulls
things
out
of
a
register,
moves
them
into
a
register
and
so
forth,
and
so
in
some
ways
it
seems
as
if
it
behaves
at
least
a
lot
like
a
mechanism
for
storing
and
retrieving
values
of
variables
from
registers,
which
is
really
what
I
care
have
cared
about.
E
E
E
And
I
don't
fully
know.
Yahshua's
view
is
about
nativism.
So
as
a
cognitive
development
person,
I
see
a
lot
of
evidence
that
a
lot
of
things
are
built
into
the
human
brain,
I
think
we
are
born
to
learn
and
we
should
think
about
it
as
nature
and
nurture,
rather
than
nature
versus,
nurture
and
I.
Think
we
should
think
about
innate
frameworks
for
things
like
understanding
time
and
space
and
causality
as
Conte
argued
for
in
the
critique
of
pure
reason
and
Spell.
E
Ki
has
argued
for
in
her
cognitive
development
work
and
the
argument
that
I've
made
in
the
paper
on
the
left
is
that
richer
innate
priors
might
help
artificial
intelligence.
A
lot
machine
learning
has
historically
typically
avoided.
Nativism
of
this
sort
and
as
far
as
I
can
tell
Joshua,
is
not
a
real
fan
of
nativism
and
not
totally
sure.
E
Here's
some
empirical
data
showing
that
nativism
in
neural
networks
works.
It
comes
from
a
great
paper
by
Yann
Laocoon
in
1989,
where
he
compared
four
different
models
and
the
one
that
had
more
innate,
miss
and
the
terms
of
a
convolution
prior
for
those
who
know
that
what
that
is
were
the
ones
that
did
better.
This
is
just
very
quickly
a
picture
of
a
baby,
ibex
climbing
down
a
mountain
I.
Don't
think
anybody
could
reasonably
say
that
there's
nothing
innate
about
the
baby
ifx.
E
E
Sensory
motor
loop
through
in
the
cartoon
version
of
the
debate,
yahshua
wins
by
saying
your
brain
is
a
neural
network
and
everybody
was
wow.
I
guess
Joshua
was
right
after
all,
and
Yahshua
did
at
least
half
and
Jess
make
a
similar
argument
to
me
on
Facebook
when
he
said
your
brain
is
a
neural
net.
All
the
way.
Of
course,
deep
neural
networks
aren't
really
much
like
brains.
I've
been
arguing
that
for
a
while
they're
there,
many
cortical
areas,
many
neuron
types,
many
different
proteins
in
different
synapses
and
so
forth,
and
so
on.
E
I
actually
heard
Yahshua
would
make
essentially
the
same
argument
at
nerves
last
week
and
so
I
think
we
probably
pretty
much
agree
about
that.
He
made
a
beautiful
argument
about
degrees
of
freedom,
in
particular
they're
loved,
but
the
critical
question
is
really:
what
kind
of
neural
network
is
the
brain?
So
going
back
to
Mars
distinction?
You
could
build
anything
you
want
at
any
computation
you
want
at.
If
tinker
toys
are
out
of
neurons.
We
really
want
to
know
whether
the
brain
is
a
symbolic
thing
at
the
algorithmic
level
or
not,
and
then
we
ask
well.
E
E
E
Why
exclude
them
from
AI?
We
can't
prove
that
they're
inadequate,
they
have
proven
utility
most
of
the
world's
computer
code,
is
written
in
it
and
so
forth
and
Watts
importantly,
lots
of
the
world's
distilled
knowledge
comes
in
the
form
of
symbols.
So
you
know
everything
in
Wikipedia
is
and
is
symbolic
we'd
like
to
be
able
to
use
that
in
our
machine
learning
systems,
five
compositionality
yahshua
has
been
talking
a
lot
about
compositionality
and
I.
Think
he
will
tonight
I
think
he
means
something
different
than
I
mean
by
it.
So
I'll.
E
Let
him
give
his
description
later,
but
I
think
it's
partly
about
putting
together
different
pieces
of
networks
and
so
forth,
I'm
really
interested
in
the
linguist
sense,
which
is
how
you
put
different
parts
of
sentences
together
into
larger
holes.
Here's
a
good
example!
Last
week
my
friend
Jeff
Clun
I've
been
encouraging
him
to
come
to
UBC
and
encouraging
UBC
to
hire
him
from
a
job,
and
my
friend,
Alan
Mackworth
said
good
news.
Jeff
Clun
accepts
and
I
wrote
back
and
said
awesome.
He
told
me
it
was
imminent,
but
swore
me
to
secrecy.
A
E
E
Can
barely
get
a
system
to
represent
the
difference
between
eating
rocks
and
eating
apples,
and
this
famous
quote
you
can't
cram
the
meaning
of
the
entire
effing
sentence
into
a
single
vector,
I
think
still
stands.
Compositionality
is
not
just
about
language,
so
it's
also
about
learning
different
concepts
and
putting
them
together
in
different
ways.
Here's
my
kids
inventing
a
new
game.
Ten
minutes
later
they've
combined
things
that
they
know
children
can
learn
something
in
a
few
trials
and
we
haven't
figured
out
how
to
do
that
yet
synthesis.
E
E
Perceptrons
on
their
own
won't
be
the
answer.
We
both
think
everybody's
going
forward
should
be
working
on
the
same
things:
compositionality
reasoning,
causality,
hybrid
models,
extrapolation
behind
the
training
space,
and
we
agree
that
we
should
be
looking
for
systems
that
represent
more
degrees
of
neural
freedom,
respecting
the
complexity
of
the
brain.
At
the
same
time,
I
hope
to
have
convinced
you
that
simple
manipulation.
D
E
E
E
Ben
geo
lacunae,
Hinton
kept
plugging
away
despite
resistance,
I
hope
people
doing
symbols
will
keep
plugging
away.
Here's
my
prediction
in
my
last
slide
when
Yahshua
applies
his
formidable
model
building
talents
which
I
envy
models
that
acknowledge
and
incorporate
explicit
operations
over
variables.
Magic
will
start
to
happen.
Thank
you
very
much.
Thank.
A
G
F
B
A
A
A
A
D
C
A
A
A
H
H
H
Views
how
deep
learning
might
be
extended
to
dealing
with
system
to
computational
x'
computational
capabilities,
rather
than
taking
the
old
techniques
and
combining
them
with
sister
with
neural
nets?
I
want
to
talk
briefly
about
attention
mechanisms
and
why
these
may
provide
some
of
the
key
ingredients
that
Gary
has
been
talking
about
that
make
symbolic
processing
able
to
do
very
interesting
things,
but
how
we
can
do
it
within
a
new
on
that
framework
and
yeah
and
then
turn.
A
H
When
is
a
strong
man,
it
tends
to
be
used
to
mean
ml
apiece
from
1989,
just
like
Gary
use
the
term
just
a
few
minutes
ago,
if
you
open
the
lasts
Europe's
proceedings,
you'll
see
that
it's
much
more
than
that.
So
deep
learning
is
really
not
about
a
particular
architecture
or
a
particular
even
a
particular
training
procedure.
It's
not
about
backdrop.
It's
not
about
covenants
or
an
ends
or
MLPs.
It's
something!
That's
moving,
it's
more
of
a
philosophy.
H
Transfer
learning,
learning
to
learn,
and
so
on
and
I
will
argue,
I
think
with
the
tools
to
move
forward,
include
things
like
reasoning,
search
inference
and
causality
and
to
connect
to
neuroscience,
because
Gary
mentioned
that
there's
actually
a
very
rich
set
of
work
happening
in
the
last
few
years
connecting
again
the
modern,
deep
learning
research
with
neuroscience.
We
had
a
paper
just
published
in
Nature
Neuroscience
called
deep
learning
framework
for
neuroscience,
but
I
won't
have
time
to
talk
about
it
today.
D
H
Means
something
different
from
the
normal
form
organization,
where
you
have
data
from
one
distribution,
and
we
worry
about
generalizing,
to
example,
from
the
same
distribution.
When
we
talk
about
extrapolation
Gary,
it's
not
clear
whether
we're
talking
about
generalizing
to
new
configurations
coming
from
the
same
distribution.
So
you
have
to
think
about
the
notion
of
distribution
to
be
able
to
make
a
difference
for
agents
in
the
world.
H
But
one
of
the
things
I've
done
in
the
2000s
is
try
to
help
figure
out.
Why
even
can't
neon
nets
and
the
ones
from
the
80s
with
presentations
have
a
powerful
form
of
compositionality
and
they're
gonna
go
into
details
of
that,
but
this
dates
to
about
five
years
old
and
similarly,
why
composing
layers
brings
in
the
form
of
compositionality.
So,
basically
my
argument
is
we
have
these
two
forms
already
in
your
meds.
H
H
The
training
distribution-
not
just
it's,
not
just
that
it's
a
novel
pattern-
is
that
that
the
one
that
may
be
unlikely
under
the
kinds
of
distribution
area
and
yet
our
brain
is
able
to
come
up
with
these
interpretations.
These
novel
combinations
and
so
on
and
Europe's
I
gave
this
example
of
driving
in
a
new
city
where
you
to
be
a
little
bit
creative
and
combining
the
skills
you
know
in
in
novel
ways
in
order
to
solve
a
difficult
navigation
problem.
H
H
So
attention
selects
an
element
from
a
set
of
elements
in
the
lower
layer,
and
it
sends
the
selected
element
in
a
soft
way,
at
least
the
soft
attention
kind
of
that
we
do
in
deep
learning
typically,
and
so
the
receiver
gets
a
vector,
but
it
doesn't
know
where
that
vector
comes
from,
and
so
in
order
to
really
do
a
good
job.
It's
important
for
the
receiver
to
get
information
not
only
about
the
value
that
is
being
sent,
but
also
where
it
comes
from
and
the
where
is
sort
of
a
name.
H
Now
it's
not
like
a
symbolic
name.
It's
we
use
vectors.
What
we
call
keys
and
transformers,
for
example-
and
you
can
think
of
these
as
the
neon
that
form
of
reference,
because
that
information
can
be
passed
along,
can
be
used
again
to
match
some
element
to
some
other
element
to
form
a
firm
for
their
attention
operations.
H
H
A
H
There
are
these
dependencies
which
you
can
think
of
like
a
sentence
like
if
I
drop
the
ball,
it
will
fall
on
the
ground
which
relate
only
a
few
variables
together.
Now,
of
course,
each
concept
like
ball
can
be
involved
in
many
such
sentences,
and
so
there
are
many
dependencies
that
can
be
attached
to
a
particular
concept,
but
each
of
these
dependencies
is
itself
sort
of
sparse
involves
few
variables,
so
we
can
just
represent
that
in
in
machine
learning
as
a
sparse
graphical
model
as
far
as
factor
graph.
H
Would
be
interesting
is
that
it's
something
we
desire
for
are
the
kinds
of
high
level
variables
factors
that
we
communicate
with
language.
So
there's
a
strong
connection
between
these
notions
and
language.
The
reason
being
that
the
things
we
do
consciously
we
are
able
to
report
through
language,
whereas
the
things
we
don't
do
consciously
that
are
going,
you
know
below
the
level
of
consciousness.
We
can't
report
and
presumably
there's
a
good
reason
for
this,
because
it's
just
too
complex
to
be
put
in
a
few
simple
words.
H
But
but
what's
interesting
is
that
if
we
can
put
these
kinds
of
priors
on
top
of
the
highest
level
of
representations
of
our
neural
Nets,
then
it
will
increase
the
chances
of
finding
the
same
sorts
of
representations
that
that
people
use
in
language,
so
I
call
them
semantic
factors
another
prior
that
I've
been
talking
about,
has
to
do
with
causality
and
changes
in
distribution
because
remember,
I
started
this
discussion
by
how
do
we
change.
H
Then
we
have
to
add
something
else
right.
This
is
something
fundamentally
important
in
order
to
cope
with
changes
in
this
fuchsia.
Otherwise,
the
the
nudist
region
could
be
anything
right.
So
we
have
to
make
some
sort
of
assumptions
and
I
presume
that
evolutionists
put
these
kind
of
assumptions
in
in
human
brains
and
probably
animal
brains
as
well,
to
make
us
better
equipped
to
deal
with
those
changes
of
distribution,
and
so
what.
H
A
prior
here
and
really
inspired
a
lot
by
the
work
of
people
like
shop,
golf
and
Peter's
and
others
in
causality,
is
those
changes,
are
the
result
of
an
intervention
on
one
or
a
few
high
level
variables
which
we
could
we
can
go
causes
so
there's
this
prior
that
some
many
of
the
high
level
variables
that
I'm
talking
our
causal
variables.
In
other
words,
they
can
be
calls
or
they
could
be
effect
of
something
or
they
related
to
how
it
caused
changes.
H
F
A
H
H
Another
thing
that
we
have
explored
is
related
to
modernization
and
systematic.
Realization
is
the
idea
that
we're
going
to
dynamically
recombine
different
pieces
together
different
pieces
of
knowledge
together
in
order
to
address
a
particular
current
input.
So
we
have
a
recent
paper
called
we
current
independent
mechanisms,
which
is
one
first
about
that
and
I'm
not
gonna,
go
through
the
whole
thing,
but
some
of
the
main
ideas
is
that
we
have
a
recurrent
net.
It's
broken
down
into
smaller,
recurrent
Nets,
which
you
can
think
of
different
modules,
which
we
call
independent
mechanisms.
H
They
have
separate
parameters,
they're
not
fully
connected
to
each
other,
and
so
the
number
of
free
parameters
is
much
less
than
n
rec
in
the
regular
recurrent
net.
Instead,
they
communicate
through
a
channel
that
uses
a
tension.
We
can
such
that
they
can
basically
only
send
these
these.
These
named
vectors,
these
key
value
pairs
in
a
way
this
that
makes
it
more
plug-and-play
that
the
same
module
can
take
as
input
the
output
coming
from
any
module
so
long
as
they
speak
the
right
language
that
they
fill
the
right
slots.
H
Instances,
it's
not
like
there's
a
rule
for
my
cat
and
my
cat
food
there's
a
general
rule
that
applies
to
cats
and
kept
food
in
general
right,
and
so
we
do
these
kinds
of
things.
Of
course,
a
lot
in
machine
learning
is,
you
know
in
in
graphical
models
these
date
back
to
even
the
convolutional
nets
and
dynamic
Bayes
nets
which
share
parameters,
and
so
so
something
like
this
needs
to
be
there
as
well
at
the
representation
of
the
dependencies
between
the
high
level
factors.
H
Causal
variables
and
in
the
same
spirit
and
I,
didn't
have
time
to
talk
about
it
because
it's
really
a
whole
other
talk,
but
very
closely
related
to
this
subject
agency.
So
we
are
agents
we
intervene
in
our
environment.
This
is
closely
connected
to
the
causality
aspect
and
the
high
level
variables.
If
you
look
at
the
ones
we
manipulate
with
language
often
have
to
do
with
agents,
objects.
A
H
H
Correspond
to
different
time
scales,
there
are
things
about
the
world
that
change
quickly.
There
are
things
that
are
very
stable
right,
so
there's,
like
general
knowledge
that
you're
gonna
keep
for
the
rest
of
her
life
and
there
are
aspects
of
dual
that
can
she
change.
We
learn
new
faces.
We
learn
new
tricks.
H
In
consciousness,
related
and
potentially
different
from
the
symbolic
program,
well,
you
would
like
to
build
in
some
of
functional
advantages
of
a
classical
AI
will
be
simple
manipulation
in
your
nest,
but
but
in
an
implicit
way.
So
we
need
efficient
and
coordinated
large-scale
learning.
We
need
semantic
routing
in
system
1
and
the
perception
action
loop.
We
need
distribute
representations
for
germination,
which
has
been
you
know,
big
success
for
deep
learning.
We
need
efficient
search
that
space
on
system
1.
We
need
to
handle
uncertainty,
but
we
want
to
operate
these
other
things
or
because
I.
H
A
A
F
H
How
symbols
should
be
represented,
my
bad?
We
can
get
many
of
the
attributes
of
symbols
without
the
kind
of
explicit
representation
of
them,
which
has
been
the
hallmark
of
classical
AI.
So
we
can
get
categories,
for
example,
by
having
multimodal
representations
of
distributions.
We
can
use
things
like
Gumbel
softmax,
which
encourage
separation
into
different
modes.
We
can
get
in
directions
variables,
as
I
mentioned
already.
We
can
get
recursion
by
recurrent
processing
and
we
can
get
a
form
of
context.
Independence
coming.
E
B
E
F
D
E
E
Trying
to
build
an
intelligent
system
about
talks,
Google
search
is
in
some
ways
an
intelligent
system
and
some
dawn,
but
I
think
you
have
two
avenues
here.
You
can
either
say
it's
so
different
from
an
intelligent
system
that
it's
not
interesting
or
you
can
say
it's
interesting
and
it
does
show
the
proof
of
concept.
F
H
H
E
H
I'm
talking
about
so
how
the
brain
works
and
I
would
like
to
build
AI
in
the
future.
E
H
I
get
off
the
hybrid
train
when
it's
about
taking
the
good
old
algorithms,
like
in
production
systems
and
and
ontology,
and
rules
and
and
logic
which
have
a
lot
of
value
and
I
think
can
serve
as
an
inspiration
and
trying
to
take
them,
basically
glue
them
to
neural
nets.
So
people
have
been
trying
to
do
these
kinds
of
things
for
time.
H
In
the
90s
there
was,
and
I've
tried
to
outline
in
my
last
couple
of
slides,
I
guess:
I
misunderstood
that
I
had
two
more
minutes
left,
but
I
had
to
outline
the
reasons
why
it
couldn't
it
couldn't
work,
and
it's
not
just
about
how
the
brain
works.
But
you
know
for
machine
learning
reasons
for
practical
computational
reasons,
so
so
one
of
them
is
search
so
right
now
what
I
mean
by
search
is
what
we
do
when
we
have
the
knowledge
and
say
things.
H
F
H
One
thing,
and
sometimes
two
and
it
really
doesn't
work-
we
try
three
or
four
go
masters,
go
up
to
fifty
okay,
but
like
their
their
brain
is
weird
because
they've
been
trained
or
you
know,
people
who
are
really
good
at
algebra
can
so
on,
but
but,
like
normal
behavior
involves
this
very
intuitive
sort
of
like
we
know
where
to
search
and
that's
based
on
system
one,
that's
based
on
something
that
we
don't
have
conscious
access
to.
That
knows
where
to
search,
and
so
so
that's
one
reason
why
we
we
can't
use
the
old
algorithms.
H
It
wasn't
sufficiently
rich
kind
of
representation
in
order
to
get
good
realization.
You
want
to
represent
everyday
concepts
like
words
in
natural
language,
by
sort
of
this
sub
symbolic
representations
that
involve
many
attributes,
and
this
this
allows
us
to
generalize
across
similar
things
and
and
I've
read
some
of
the
things
you
wrote
and
you
can
say
well,
these
attributes
are
like
symbols
themselves.
Sure
you
can.
You
could
do
that,
but
the
important
point
is
now:
you
have
to
manipulate
these,
these
rich
representations,
which
could
actually
be
fairly
high
dimensional.
H
Yeah,
of
course,
we
need
to
keep
the
things
that
have
worked
well
in
machine
learning,
which
certainty
which
some
people
are
doing
like
Josh
Tenenbaum
with
probabilistic
programming
and
so
on.
So
I
think
there
are
some
efforts
going
in
those
directions,
but
we
need
to
keep
these
ingredients
together.
I'm.
E
D
E
Are
micro
theories
to
target
reasoning
in
particular
domains
and
that's
I
think
an
idea
that's
worth
exploring,
but
I
absolutely
agree
that
if
you
have
unbounded
inference
you're
in
trouble,
I
think
that
alphago
is
an
example
where
you
bound
the
search
search,
partly
through
a
non
symbolic
system,
and
then
you
use
symbolic
system
there
as
well,
and
so
it's
kind
of
a
hybrid.
In
what
way
is
it
a
symbolic.
H
H
H
At
the
end
of
my
presentation,
we
can
get
discreteness,
not
necessarily
in
its
hardest
form,
in
its
purest
form.
As
you
have
in
symbols,
you
can
get
discreteness
by
having
you
know,
that's
lateral,
inhibition
that
creates
a
competition
such
that
the
dynamics
converge
to
one
mode
or
another
mode.
This
is
what
you
observe
in
the
brain.
By
the
way
when
you
take
a
decision,
there's
a
sort
of
competition
between
different
potential
outcome.
F
D
E
Think
you're
strong,
mending
symbols
because
lots
of
people
have
put
probabilities
and
uncertainty
into
symbols
and
you
think
and
I
think
it's
an
interesting
discussion
point
that
I'm
straw
Manning
deep
learning,
so
you
said
I'm
attacking
the
models
in
the
1980s
and
there's
some
truth
in
it
and
then
there's
a
question
of
what
the
scope
should
be
so
I
think
both
for
symbols
and
for
neural
networks.
There's
a
kind
of
question
about:
what's
the
proper
scope
of
them
and
then
we're
actually
pushing
to
the
same
place
from
opposite
sides.
E
So
I
would
argue
that
the
kind
of
deep
learning
stuff
that
was
straight
out
of
the
80s,
which
is
you
know,
continued
until
like
2016
in
my
view.
But
we
could
argue
about
that.
You
know
just
let's
have
a
big
multi-layer
perceptron,
let's
pile
a
lot
of
data
in
and
hope
for
the
best
which
I
don't
think
you
believe
anymore.
E
But
maybe
you
did
at
one
point:
that's
one
kind
of
deep
learning,
that's
the
kind
of
I,
don't
know
prototype
or
canonical
version
of
deep
learning,
and
you
want
to
open
deep
learning
to
a
whole
lot
of
other
things
and
I.
Think
it's
some
level.
That's
fine!
It's
some
level!
I!
Think
it's
changing
the
game.
You
might
write
them
in
a
second
I.
Think
that,
with
respect
to
symbols,
you
might
feel
I'm
doing
the
same.
So
I
want
to
say
sure.
D
E
I
A
A
H
So
it's
it's
been
around
and
another
thing
you
have
to
keep
in
mind:
I've
been
working
on
recurrent
Nets
was.
H
H
H
D
B
B
E
B
F
D
E
It
will
bring
something
out.
That's
interesting,
so
I
made
a
slide
that
I
didn't
show,
which
I
made
a
slide.
That
I
did
not
have
time
to
show,
which
is
both
has
a
picture
of
a
great
new
paper
by
Yahshua
that
we
had
on
the
reading
list
that
is
well
worth
reading.
It
is
about
causality
and
it's
very
mathematical
paper
I
took
what
I
think
is
some
of
the
core
math
of
it
at
the
bottom.
I
admit
that
I
didn't
read
the
paper
as
carefully
as
I
wish
that
I
had
but
Yahshua.
E
By
trying
to
make
some
clever
observations
about
how
distributions
change
over
time
relative
to
interventions
that
are
made,
which
is,
of
course,
the
classic
thing
that
we
try
to
do
when
we
run
experiments
and
he's
got
I
think
some
very
clever
ways
of
going
after
that
within
neural
networks,
and
god
bless
him
I
think
it's
great
work,
it's
not
the
work
that
I
would
do,
but
I
think
it's
terrific
I'm,
just
gonna
draw
a
contrast.
I'm.
E
E
Reasoning
about,
and
the
formalism
that
Ernie
came
up
with
that
I
think
is
responsive
to
your
question
is
something
that
broke
things
into
time-space
manipulation,
things
about
rigid
objects
in
the
histories
of
objects,
so
he
did
a
very
careful
analysis
of
the
knowledge
that
one
needs
in
order
to
do
this
basic
thing.
But
it's
not
a
trivial
thing,
because
we
used
container
metaphors
for
a
large
fraction
of
the
things
that
we
talked
about.
I,
don't
want
to
say
it's
50%,
but
it's
significant.
E
E
Be
able
to
make
inferences
about
these
things,
you
need
prior
knowledge
and
there's
a
question
about
whether
that
knowledge
is
innate
or
it's
acquired
experientially.
But
the
argument
is,
you
won't
be
able
to
make
these
inferences
unless
you
have
this
knowledge
about
sets
and
object,
containing
regions
and
have
these
kind
of
axioms
and
those
kinds
of
axioms
things.
D
E
D
E
E
I
totally
welcome
the
kind
of
stuff
that
gosh
was
doing,
even
if
I
personally
don't
have
the
skills
to
do
it
and
I
think
the
empirical
question
it's
kind
of.
Could
you
from
the
bottom
up
derive
all
this,
although
I
feel
like
maybe
I
strum
in
yahshua
I
thought
that
he
was
more
anti
nativist
than
maybe
he
really
is
because
he
acknowledged
evolution
so
I'll
say
one
more
sentence
and
then
turn.
E
E
F
H
Similar
thinking
exam
think
that
learning
has
to
be
from
a
blank
slate.
In
fact,
we
have
theorems
from
the
90s
the
the
no
finish
theorem.
That
clearly
says
you
can't
have
learning
if
you
don't
have
some
priors
okay,
but
what
we're
saying
is
we're
saying:
is
we'd
like
to
be
able
to
get
away
with
as
little
prior
as
possible?
Now,
how
is
little
measured?
Well,
you
can
think
of
measuring
it
in
bits.
H
So,
if
you
think
about
how
big
is
a
program
that
would
encode
those
priors-
and
you
know
you
would
zip
that
program,
that
would
be
how
big
the
prior
is.
So
the
kinds
of
priors
have
been
talking
about
in
my
presentation.
I
was
talking
about
priors,
but
these
are
players
that
in
a
way,
are
not
going
to
require
many
bits,
and
so
it's
very
eco.
H
Of
it
is
about
completely
hard-coded
behavior,
but
these
are
not
the
behavior
that
are
most
adaptive
these
another
behavior
that
allow
a
species
to
adapt
to.
You
know
as
as
well
as
human
as
being
have
been
able
to
do
so.
It's
more
interesting
for
me
to
think
about
the
part
of
what
evolution
has
discovered,
that
is
more
general.
These
are
the
most
generic
priors
and,
of
course,
we
have
priors
that
are
very,
very
specific.
H
G
H
E
E
H
H
E
I
guess
I
got
two
things
to
say
there,
but
one
is
actually
from
yawns
works,
and
since
you
mentioned
it,
we
actually
argued
about
this
very
thing.
The
other
day,
in
this
particular
empirical
case
having
more
of
a
prior,
was
actually
better
right.
So
in
this
particular
case
having
a
convolutional
prior
made.
D
E
They
were
very
clever,
they've
been
very
valuable
to
the
world.
Maybe
you
know
I
got
25
boxes
up
there.
There
are
three
lines
each
and
we
just
need.
You
know.
24
more
discoveries
of
that
magnitude
is
the
genome
big
enough
to
encode
all
those
half.
Sorry
95%
of
our
genes
are
involved
in
brain
development.
I
think
there's
room
in
there
to
encode
know
that
many
maybe
10
times
more.
H
E
E
A
H
E
E
A
H
E
B
A
H
Well,
I,
like
this
question
very
much
because
I
found
this
question
in
the
early
90s
with
my
brother
Sammy,
and
it
was
essentially
the
subject
of
his
thesis
proposal.
Other
the
thesis-
and
this
was
one
of
the
first
papers
on
meta
learning.
We
were
trying
to
better
learn
a
learning,
a
synaptic
learning
rule.
We
didn't
have
enough
computational
power
to
do
this
and
even
now,
I
think
in
order
to
realize
the
kind
of
ambitious
program
that
Jeff
is
talking
about.
We
would
need
a
lot
more
computational
power.
H
H
Abstract
it
really
helps
a
lot
if
you've
been,
if
you
put
in
a
bit
of
the
right
structure,
and
in
order
to
do
that,
you
need
to
do
experimentation
of
the
kind.
We
do
normally
machinery,
where
you
designed
a
learning
algorithm
completely
and
that
helps
to
figure
out
what
would
be
the
the
right
building
blocks
and
the
right
inputs
and
outputs
that
are
needed
for
learning.
G
H
E
Pretty
much
agree
with
Yahshua's
answer:
it
I'll
answer
it
in
a
slightly
different
way.
In
principle,
we
know
that
evolution
is
a
mechanism,
that's
powerful
enough
to
evolve
minds
because
it
evolved
our
minds
and
having
the
machine
do
the
work
that
sort
of
stands
in
for
evolution
would
be
great
in
practical
matters.
It
does
matter
what
you're
trying
to
evolve
and
I
think
what
has
happened
empirically
in
the
evolution
of
neural
networks.
E
Literature
is
that
people
start
with
too
little
in
the
way
of
priors,
and
so
they
end
up
recapitulating
some
of
our
journey
to
bacteria,
but
not
so
much
of
journey
from
say,
chimpanzees
to
human
beings.
In
principle,
we
know
it
can
work
in
reality
having
a
tightly
constrained
problem
and
probably
a
bit
of
priors
to
help
us
there
might
help
it
work
even
better
than
it
is
and
I
think
it's
totally
worth
exploring.
B
B
H
Think
it's
important
in
general
to
ask
the
question
of
how
our
work
as
researchers
will
be
used
or
could
be
used,
because
you
know
you
don't
need
to
go
very
far
in
the
future.
Today
we
already
see
the
misuse
of
AI
in
many
ways
and
I'm
very
concerned
about
how
we
are
creating
tools
that
can
be
destructive
and
endanger
democracy
and
endanger
human
rights.
H
Systems
that
we
can
build,
but
I,
don't
think
it
changes.
Fundamentally
the
fact
that
we
are
building
gradually
more
and
more
powerful
systems.
There's
the
question
that
some
philosophers
are
asking
about.
You
know
whether
we
should
eventually
give
personhood
to
intelligent,
conscious
machines.
I,
don't
think
we
are
anywhere
close
to
understanding
these
questions
enough
to
be
able
to
answer
these
sort
of.
Why.
B
G
So
my
question
I
have
several
questions,
but
I'll
limit
it
to
two
one.
Is
that
Gary
Marcus
I
don't
know
if
your
professor
Marcus
said
something
that
your
professor
ben
Gio's
approach
relied
too
heavily
or
his
approach
to
deep
learning
and
his
belief
in
it
relies
too
heavily
on
larger
data
sets
to
yield
answers.
So
why
is
that
necessarily
bad?
There
are
large
data
sets
and
ways
of
constructing
them,
and
they
said
you
want
me
to
ask
both
questions.
Let's.
E
E
Ways-
and
they
do
trying
to,
for
example,
gather
data
about
a
particular
kind
of
accident
when
it
happens,
it
so
forth.
It's
very
focused
on
the
data
and
not
so
much
focused
on
certain
kinds
of
innovations
in
algorithm
space.
That
I
would
like
to
see
so
I
have
no
objection
to
gathering
more
and
more
data.
I
think
that
getting
clean
data
is
really
really
when
people
often
underestimate
the.
D
D
H
So
I
think
that
I'm
interested
in
the
small
daily
regime
in
to
the
extent
that
we
also
have
a
lot
of
data
before
we
get
to
that
point.
So
humans
learn
a
new
task
after
they've
seen
a
lot
about
the
world
right,
you
can't
there's
no
chance
that
you
will
be
able
to
learn
in
a
meaningful
way
without
a
lot
of
that's.
F
D
E
Gosh
we
meant
by
that
there
are
problems
where
people
learn
things
with
small
amounts
of
data.
Yoshua
would
say
that's
because
they
have
a
lot
of
experience
elsewhere
and
that's
often
the
case.
In
any
case,
the
small
data
regime
is
like:
how
do
you
learn
something?
If
you
don't
have
10
million
data
points?
You
know
if
you're,
my
kids
and
you
learn
a
new
game
in
five
trials.
E
How
do
you
do
that
and
clearly
some
of
it
is
you
leverage
prior
experience,
the
only
thing
I'm
gonna
add
there
is
the
reason
that
I
did
disagreement
is
the
reason
I
did
that
baby
experiment
back
in
1999
was
to
show
that
there
were
some
things
that
little
kids
could
learn
without
much
direct
experience.
So
I
made
up
the
language,
so
they
had
no
prior
experience
with
the
language
that
they
did.
E
E
D
I
Thank
you
for
your
presentations,
dr.
Marcus,
you
talked
about
the
compositionality
and
the
need
to
take
into
account
the
constitutionality
as
from
a
linguistic
point
of
view,
so
we
have
debates
and
arguments
on
compositionality
but
to
make
a
simple
system.
V
accept
the
compositionality.
We
had
some
progress
in
the
neural
nets:
the
recursive
neural
nets
for
for
compositionality.
However,
those
efforts
has
been
abandoned.
They
have
abandoned
the
efforts
on
the
recursive
Nets.
We
don't
do.
Research
anymore
under
recursiveness,
carry.
I
I
H
On
that
I,
don't
think
it's
a
resistance
as
much
as
an
obsession
to
beating
the
benchmark,
which
could
be
good
or
bad
all
right.
It's
because
these
very
large,
fairly
simple
architectures
have
been
working
so
well
so
I
mean
a
good
example.
Now
is
the
success
of
transformers
transformers
are
working
incredibly
well
but
they're
using
actually
these
key
value
pairs
I
was
talking
about
they're
operating
on
sets.
H
So
you
know
the
the
recursive
nests
was
one
attempt,
but
there's
been
others
that
have
been
more
successful
and
and
maybe
recurse
invest
will
come
back.
We
don't
know
the
history
of
science
is
very
complicated,
as
we've
seen
with
deep
learning,
so
I
think.
There's
a
lot
actually
I,
don't
read
the
the
sociology
of
the
current
deep
learning
field
like
you
are
in
fact
there's
a
lot
of
interest
in
exploring
how
we
can
put
some
architectural
structure
in
your
nets
that
facilitate
the
manipulation
of
language
and
reasoning.
E
That
passage
with
at
least
that's
what
my
picture
shows
I've,
never
seen
such
a
multicolored
beautiful
forest
of
sapphire
eyes
on
the
same
corner
of
the
street
in
a
bar
before
it's
like
fabulous
that
has
created
this
surface
or
realist
prose.
On
the
other
hand,
when
I
force
it
into
the
nonfiction
genre,
it
seems
a
bit
ridiculous.
So
so
maybe
we
can
know
if
this
is
gonna
work
or
I'm
gonna
lose
it
an
example.
E
So
I
gave
things
about
conventional
knowledge,
definitions,
transformations,
atypical
consequences
and
so
forth,
and
then
I
have
data
from
these
models
on
the
right
and
they're.
You
know
typically
doing
like
30%
or
10%,
or
something
like
that.
So
there
are
sharp
limits
and
I
think
those
limits
come
because
we
don't
have
kind
of
a
parse
tree
on
the
output
yeah.
C
C
A
Intelligence
want
the
best
question
is
gone
quantum,
maybe.
H
H
Molecules
operating
in
a
quantum
way,
but
but
you
know
if
we
abstract
one
level
up,
it's
all
computation
and
that
that
is
not
quantum
it
by
Nature.
So,
of
course
we
don't
know
what
you
know:
I
don't
have
a
crystal
ball
at
this
point,
I
think
the
majority
of
the
community,
both
in
neuroscience
and
in
computer
science,
are
betting
on
traditional
computing
in
the
sense
that
it's
not
quantum,
but
another
thing
I
want
to
say
is
right
now
how.
H
A
J
E
E
D
A
E
E
H
It's
at
least
as
important
that
our
society
invests
even
more
on
the
question
of
how
are
we
going
to
deploy
these
things?
What
is
the
responsibility
of
everyone
and
a
chain
from
the
researcher
to
the
engineer,
to
the
people
doing
auditing
or
to
government's
drafting
regulations,
to
make
sure
that
we.
H
H
A
direction
that's
best
for
Humanity,
that's
best
for
citizens
and
I'm
very
concerned
that
we're
building
tools
that
are
too
powerful
for
our
collective
wisdom
and
I'm.
Fine
with
like
slowing
down
the
deployment
of
AI
I.
Think
governments
are
not
yet
ready
to
do
the
proper
regulation
and
we
need
to
spend
more
time.
H
H
Or
one
I
want
to
mention
that
here
in
Montreal
we
we've
been
really
working
hard
on
this
question
and
we
came
up
last
year
after
two
years
of
work
involving
not
just
scholars
but
also
citizens.
With
a
thing
we
call
the
Montreal
declaration
for
the
responsible
development
of
AI
I,
invite
you
to
check
it
out
online
and
we're
pushing
these
ideas
to
the
Canadian
government.
H
E
Involved
on
building
things,
I'll
just
add
one
thing,
because
I
think
we
have
to
go
to
the
online
questions,
but
I
wouldn't
a
amplify.
The
point
about
Wild
West,
a
good
way
to
think
about
this
he's
right
now.
A
driverless
car
manufacturer
can
basically
put
anything
on
the
road
we
can
sue
them
after
the
fact
if
they
cause
great
harm,
but
there's
no
regulations
essentially
about
what
you
can
do
with
a
driverless
card.
D
B
E
A
H
A
A
E
Definition
of
symbols-
I,
don't
think
we
should
waste
time,
arguing
about
that.
I
think
that,
from
the
perspective,
the
symbol,
manipulation.
The
real
question
is:
do
we
have
operations
over
variables?
You
can
define
a
symbol
in
such
a
way
that
it
encompasses
everything
or
nothing
and
I.
Don't
think!
That's
where
the
debate
should
be
well.
H
A
B
B
H
H
E
Deep
learning
and
symbolic
AI
are
compatible
and
can
provide
the
best
of
those
worlds.
Is
there
any
evidence?
I
think
that
the
best
evidence
that
we
have
for
that
is
we
have
some
people
building
actual
hybrid
models
in
the
real
world
to
do
useful
things,
none
of
them
achieve
human-level
intelligence.
You
know
deep
learning
system
does
that.
A
F
E
D
E
We
think
the
territory
is
that
people
need
to
explore
I
think
the
biggest
take-home
message
as
I
said:
I
just
actually
agree
a
lot
about
what
that
geography
is.
That
needs
to
be
explored.
We
have
some
differences
about
where
to
go
in
that
exploration.
Neither
of
us
think
that
we've
reached
the
destination
by
any
means,
so.
H
I
want
to
talk
about
the
question
about.
Do
you
think
language?
Understanding,
language
understanding
is
a
form
of
intelligence?
We
clearly
need
better
language
understanding
for
AI,
and
there
are
really
interesting
connections
between
language,
understanding
and
reasoning,
but
they're
really
different.
So
the
I
listened
to
a
presentation
at
the
last
Europe's
by
Federico
and
she's,
a
cognitive
neuroscientist,
and
what
she
found
with
our
colleagues
is
that
there
is
a
language
area
in
the
brain
and
it
does
process
everything
that's
connected
to
language,
but.
H
H
Understanding
refers
to
general
knowledge
of
how
the
world
works.
This
is
an
area
which
is
very
active
in
machine
learning.
People,
irrespective
of
whether
to
do
language
or
not,
are
looking
at
how
learning
systems
which
interact
with
their
environment
can
build
better
models
of
the
world,
and
if
we
don't
do
that,
we'll
never
have
good
language
understanding.
So
this
connects
with
Gary
talked
about
with
limitations
of.
A
H
H
E
Language
and
reasoning
are
clearly
separate
things,
but
they're
not
fully
separate,
so
there's
wonderful
work,
for
example,
from
Mike
Tanenhaus
and
John
John
throughs,
well,
showing
experimentally
the
people
reason
about
the
world
at
the
very
moment
where
they're
processing
it.
So
if
I
give
you
an
ambiguous
sentence,
you
will
look
to
what
are
the
things
out
there
in
the
world.
They
can
help
me
to
disambiguate
the
sentencing
you
will
reason
like.
E
Is
there
a
cup
on
the
table
or
a
couple
on
the
towel
and
I'm
gonna
put
these
all
together
in
an
understanding
of
a
sentence,
and
so
it's
hard
to
draw
a
sharp
line.
As
you
know,
interesting
work
notwithstanding,
there's
certainly
overlap.
On
the
other
hand,
a
you
know
very
clear
example
of
how
important
all
the
physical
reasoning
stuff
is
would
be
any
primate.
That's
not
a
human
right
think
about
all
the
physical
reasoning
that
a
chimpanzee
can
do
without
any
language
at
all.
E
We
could
argue
about
the
ape
language
studies,
but
I,
don't
think
they're
very
compelling,
so
you
have
species
that
can
you
know,
navigate
their
way
through
trees
and
have
social
interactions
of
all.
You
know
very
complicated
social
interactions
exchange
and
all
of
these
things
without
any
language
and
I,
think
we
would
both
be
thrilled.
If,
before
we
leave
this
mortal
coil,
we
were
able
to
build
AI
systems
that
could
do
which
imp
and
Z's
do
now.
I
have
a
personal
interest
in
language,
having
studied
it
for
most
mice,.
F
F
F
H
H
We
combine
double
pieces
of
knowledge
that
we
can
search
through,
so
we
can
search
through
how
you
know
which
piece
of
knowledge
can
be
combined
with
wispy
piece
of
knowledge.
In
order
to
find
a
question
find
an
answer
to
a
question
and
those
search
are
those
searches
are
heavily
guided
by
our
intuition?
So
we
know
where
to
search.
A
C
H
A
H
H
A
H
A
A
E
D
E
E
H
D
E
E
Say
by
and
large
that
the
results
of
extant
neural
network
unreasoning
are
not
as
impressive,
even
as
that
example
from
psych,
but
I
would
also
say.
I
was
gonna,
give
you
the
point,
and
then
we
can
come
come
back
that
if
you
take
the
broader
notion
of
deep
learning
that
Yahshua
would
like
to
defend,
and
you
start
putting
in
mechanisms
for
attention
and
indirection
and
so
forth,
which
come
at
least
a
little
bit
close
to
the
things
that
I
want.
Then
all
bets
are
actually
open.
We
don't
know
yet
what
the
boundaries
are.
E
Once
you
include
mechanisms
like
indirection,
we
know
some
of
the
things
we
can
do
there,
there's
a
lot
of
stuff
in
classical
reasoning.
I,
don't
think,
has
really
been
addressed.
Yet
there
are
other
people
that
are
more
expert
in
that,
but
I
would
say
even
just
dealing
with
quantified
sentences.
How
do
you
deal
with
everybody
loved
somebody
in
the
ambiguity
and
that
we
haven't
really
seen
that
so.
H
There's
a
question
here
about
what
do
you
think
of
transferring
structured
rules
in
the
form
of
first
order
logic
onto
Network
parameters,
as
opposed
to
encoding
the
information
in
latent
variables?
So
this
is
actually
the
kinds
of
things
that
people
were
trying
to
do
in
the
90s
trying
to
create
a
direct
analogy.
H
F
H
A
A
H
F
E
A
H
Second
is
how
to
make
sure
that
those
initial
injected
assumptions
will
still
hold
after
the
training
overcome
catastrophic,
forgetting
so
I
think
some
other.
So
the
first
question
I
think
we
just
at
least
I-
gave
partial
answer
to
this.
The
second
one
about
forgetting
is
is
very
important.
It
is
connected
to
some
of
the
things
I
was
talking
about
when
I
mentioned.
H
F
E
H
Knowledge
page
with
you
know
like
subject-object-verb
things
like
this
standard,
relational
databases
and
and
looking
for
those
words
that
it
has
seen
or
their
equivalent
synonym
representations
in
the
knowledge
base
and
then
using
attention
mechanism
pick
picking
the
pieces
of
knowledge
in
the
knowledge
base
which
can
help
it
predict
what
the
next
word
should
be.
So
this
was
done
with
some
jinan
and
it's
been
it's
been
published
and
what.
H
That
can
do
their
normal
neural
nettie
thing,
but
as
they're
computing
is
like
they're
allowed
to
go
online
and
check
for
information
that
they
don't
already
know.
That
is
not
already
integrated
into
their
insight,
brain
and
use
that
information
in
order
to
answer
questions
or
predict
something
I'm.
H
First,
looking
at
that
and
I
think
it's
only
recently
with
attention
mechanisms
in
the
form
that
involves
indirection,
that
you
can
start
thinking
about
quantification,
so
quantification.
The
way
I
interpret
it
in
a
neuron.
That
sense
is
essentially
that
you
have
these
little
modules,
which
in
your
world,
you
would
call
rules
and
that's
fine,
except
there
are
not
symbolic
rules
they're,
just
more
like
they
allowed
to
do
inference
on
some
variables
given
other
variables,
but
they're.
The
inputs
of
those
rules,
don't
have
to
be
always
the
same.
H
A
H
It
can
go
like
back
and
forth
a
couple
of
times
and
so
would
be
reasonable
to
assume
that,
although
there
is
coordination
at
a
global
level,
a
lot
of
the
learning
involves
local
local
defence,
and
so
there's
been
a
lot
of
interesting
work
in
deep
learning.
I,
don't
think
it
right
now,
we've
solved
this
problem
where
people
are
trying.
H
Innocence
and
if
you
look
at
reinforcement
learning
systems,
they
use
that
kind
of
trick
as
well
to
predict
the
reward
that
you
will
and
use
a
predicted
reward
as
an
intermediate
local
reward.
So
I
think
there
are
some
interesting
questions
about
decentralizing.
This
kind
of
learning
there's
also
more
pragmatic,
explorations.
A
A
H
E
E
That
are
tuned
to
particular.
Things
is
the
heart
of
how
the
brain
works.
It's
not
fully
modular
but
I.
Think
the
most
amazing
thing
about
the
imaging
literature
taking
pictures
scans
of
people's
brains
is
the
way
in
which
the
brain
now
in
a
connect
to
a
phrase
of
yours,
dynamically
reconfigures
itself,
in
the
course
of
anything
that
we
do
so
you
can
tell
someone
who's
coming
into
a
scanner
experiment
like
the
ones
that
Eva's
gonna
do.
Okay,
what
you're
gonna
do
now
is
you're
gonna,
take
glasses
and
you're
gonna
put.
F
E
H
C
H
A
B
A
Thanks
everybody
for
watching
folk
now,
I
mean
it's
pretty
obvious.
Everybody
basically
prefers
Joshua
I'm,
going
to
turn
my
mic
up
now,
I'm
turning
my
mic
back
up
to
regular
volume,
yeah
I
thought
this
was
fun
like
I
said,
I
did
I
didn't
know
how
this
was
going
to
go,
but
there's
good
amount
of
people
watching
us,
20s
and
30s.
You
know
that's
good
for
one
of
my
twitch
streams.
I'll
get
a
YouTube
video
of
this
online
as
well,
and
thanks
everybody
for
chatting
and
for
participating
and
also
for
voting.
A
They
all
watched
live
and
all
these
chat
chatting
people
over
here
all
watched,
live
and
voted,
and
everything
obviously
anyway,
I'm
going
to
shut
up
now
take
care
of
thanks
for
watching
thanks,
Gary
Marcus
next
yoshua
bengio
thanks
the
University
of
Montreal
for
hosting
the
debate
and
for
live
streaming.
It
even
though
the
AV
wasn't
perfect
it's
cool
at
school.
Let's
check
the
Twitter
updates.
Real,
quick
I
would
really
love
to
have
Mel
Mitchell
participate
their
future
to
be
awesome,
were
you
in
the
audience,
because
I
would?
No?
A
A
Power
of
decentralizing
learning
service
of
privacy
and
in
order
to
stimulate
house
how
I
don't
think
that
was
the
same,
a
a
debate,
at
least
I
wasn't
watching
that
one.
Just
some
people
saying
thanks
and
some
comments,
thanks
everybody
for
for
watching
with
me
and
hey!
Maybe
we'll
do
this
again.
Let
me
know
somehow
contact
me
somehow.
No
I
am
going
to
stop
the
stream.
Now.
Oh,
no!
We're
gonna
go
raid.
Okay,
so.
A
A
C
A
You
know
if
you're
watching
my
channel
on
twitch
right
now,
but
we're
going
to
switch
to
another
channel
and
all
the
viewers
in
this
channel
are
going
to
dump
into
another
channel,
and
this
guy
is
doing
some
type
of
game
programming.
It
looks
like
alright,
so
I'm
going
to
hit
this
button.
You
guys
would
say
everybody's
probably
going
to
pop
up
something
and
say
that
you're
going
to
raid
and
it's
going
to
happen
in
like
three
seconds
or
something
and
then
we're
going
to
do
it
right
now.
Boom
see
you
guys,
thanks
for
watching.