►
Description
Marcus Lewis discusses how continual learning presents a dilemma of memory vs generalization. He also presents an idea that quick few-shot learning (e.g. MAML) may offer a different, and biologically plausible way of solving this dilemma.
A
B
We
are
recording
now
so
today,
just
basically
this
morning
I
spun
up
a
quick
topic
regarding
continual
learning
regarding
neural
networks
that
just
learn
all
the
time
without
having
to
go
over
all
their
old,
all
the
all
the
old
training
examples
they
saw
in
the
past
without
losing
that
past
knowledge
and
and
just
how
it
overlaps
with
our
models
that
we've
historically
done
here
at
numenta
and
just
just
thinking
about
the
problem
space.
B
That's
that's
the
thing
I've
wanted
to
present
for
some
time,
but
then,
on
top
of
that,
sometimes
I'll,
sometimes
I'll,
read
a
paper
or
or
I'll
see
a
new
model
that
just
expands
my
imagination
about
like
how
the
brain
might
work.
How
what
might?
What
is
the?
What
is
the
bag
of
tricks
that
the
brain
might
be
using?
That
are
like
actually
biologically
plausible,
and
recently
we've
been
discussing
something
a
little
bit.
B
The
subutai
brought
up
is
like
metal
learning,
and
this
thing
called
mammal,
m-a-m-l
I'll,
discuss
that
just
in
when
I
get
to
this,
but
I
think
it
brings
an
interesting
perspective
to
continual
learning
and
that
it,
it
kind
of
doesn't
try
to
solve
continual
learning
directly.
It
helps
something
else,
but
then
it
can
help
us
like
reframe
how
you
might
think
of
continual
learning
and
and
just
in
general.
I
think
it's
just
a
cool
trick
that
you
can
you're
laughing
at
the
cat.
B
In
my
lab
in
general,
it's
just
a
cool
trick
that
that
is
just
useful
to
have
so
I
thought
I
thought
it
was
worth
presenting.
So
I
think
this
is
gonna
serve
two
purposes:
hello,
kitty,
I'll
put
the
cat
away
in
a
second.
The
two
purposes
are
one
I
can
kind
of
frame
some
of
our
algorithms
like
temporal
memory
and
spatial
pooler
and
the
context
of
neural
networks
and
then
discuss
this
problem
of
continual
learning
and
how
it
relates
to
one-shot
learning
or
few
shot.
Learning.
C
B
I'm
not
sure
what
to
do
here
so
I'll
just
run
with
it
once
I'm
showing
my
ipad
it'll
be
less
distracting.
So
here
is
what
I
drew
this
morning.
B
Sentence
I'll
I'll
walk
through
this
in
a
in
detail
in
a
few
seconds,
one
one
thing:
one
kind
of
old
idea
that
I've
heard
stated
about
neural
networks
and
this
three
days
deep
learning.
B
This
idea
predates
deep
learning
and
that's
that,
like
they're
kind
of
two
fundamental
ways
to
use
a
neural
network
like
if
you're
just
sitting
with
an
abstract
neural
network-
and
you
want
to
hook
it
up
to
do
something-
two
sort
of
families
are,
if
you
want
to
use
them
to,
you,
can
use
them
to
do
memory
or
you
can
use
them
to
do
generalization,
which
I
kind
of
depict
as
that
top
photo
up
there,
which
is
just
one
specific
kind
of
generalization.
B
Everything
I'm
saying
here
is
a
cartoon.
It's
it's
not
it's
not
this
clear
cut,
but
I
think
that
this
is
a
useful
framing
that
you
can
sort
of
think
of
neural
networks
as
as
doing
something
memory
like
or
doing
something.
This
generalization
like
where
you
define
generalization
as
that
as
that
picture
in
the
upper
right
corner,
where
you
have
classes
that
are
at
different
points
in
the
input
space
and
you're,
trying
to
figure
out
you're,
trying
to
kind
of
find
the
boundaries
of
those
classes
and.
B
In
my
mental
model
of
of
a
lot
of
the
things
we've
worked
on
here
that
what
we've
done
is
much
more
we've
done
much.
We
spent
much
more
time
on
memory
mechanisms
with
neural
networks
and
some
on
generalization,
but
more
on
memory.
B
So
a
classical
example,
going
back
to
the
80s
of
using
neural
networks
for
memory
is
auto-associative
memories
or
hopfield
networks,
and
here
I've
just
depicted
you
might
memorize
a
symbol,
a
representation
by
just
activating
them
and
forming
auto,
forming
connections
between
all
of
those
all
those
active
units,
and
so
now
a
part.
Now,
a
subset
of
that
representation
can
can
cause.
B
The
entire
representation
to
activate
jeff
has
many
times
drawn
the
association
that
a
temporal
memory
is
a
little
bit
like
an
odd
associative
memory
that
associates
with
the
previous
item
in
a
sequence,
and
so
so
temporal
memory
I
mean
we
put
it
right
there
in
the
in
the
name,
is,
is
a
memory
of
sequences
it's
and
we
can
use
it
for
things
other
than
sequences,
but
it's
at
its
core.
B
It's
it's
learning
associations
between
one
representation
and
another
and
I'd
I'd,
go
on
to
say,
like
some
of
our
other
stuff,
like
the
sensory
motor
work
here,
I've
drawn
a
quick
little
a
few
grid
cell
modules.
Some
people
here
will
know.
I
why
I
use
a
rhombus
to
talk
about
grid
cell
modules
and
forming
these
associations
between
a
temporal
memory
and
and
a
location
and
I'd
go
further
to
say,
even
even
when
we
talked
about
displacements.
B
No,
when
we
talk
about
using
displacements
for
composition,
the
whole
point
of
computing,
the
displacement
of
computing.
These
is
so
that
we
can
learn
it,
so
we
can
memorize
it.
So
so,
even
when
we're
talking
about
learning
compositional
objects,
it's
still
kind
of
a
memory
for
objects.
B
So
so
I
think
a
lot
of
what
we've
done.
Kind
of
falls
more
into
that
column
on
the
left
than
the
column
on
the
right,
whereas
deep
learning,
whereas
mostly
any
network
trained
with
backup
backdrop
that
back
propagation
is
more
in
the
column
on
the
right
and
but
but
I
will
say
that
our
spatial
pooler
is
more
on
the
right.
It's
it's
it's!
It
is
our!
It's
kind
of
our
source
of
this
type
of
generalization.
B
Now,
quick
aside,
what
I've
already
said,
the
word
generalization
can
mean
many
things.
We
do
have
other
kinds
we
can.
We
have
like
learning
an
object
at
one
orientation
and
then
inferring
it
at
another.
We
kind
of
have
that
working,
that's
a
form
of
generalization,
so
here
when
I'm
saying
generalization,
I'm
talking
about
that
picture
on
the
top
right.
So
now
the
topic
of
continual
learning
I
with
little
stars.
B
So
with
a
memory
with
our
mechanisms
and
just
generally
the
way,
memories
work
you
can
set
it
up
so
that
each
new
piece
of
information,
each
training
example
causes
a
local
change
that
doesn't
collide
with
existing
knowledge.
You
can,
if
you
introduce
dendrites
varsity
you
can,
you
can
make
a
neural
network,
have
the
ability
to
store
a
large
amount
of
information
and
it
it
doesn't
interfere
with
other
information,
but
on
the
on
the
right
column.
Here,
each
new
information
is
used
to
choose
a
better
basis.
B
Those
arrows
they
start
to
mean
something
different.
Each
arrow
starts
to
represent
a
different
type
of
feature,
and
this
can
be
disruptive
to
old
knowledge,
and
this
is
sort
of
by
design.
We
almost
want
it
to
be.
I
mean,
that's,
that's
a
that's
a
weird
way
to
say
it.
We
don't
want
it
to
be,
but
there,
but
we're
getting
something
good,
but
from
the
fact
that
that
it
does
disrupt
old
knowledge
and
the
instructive
example.
I
I
wrote
on
the
bottom
here
is
often
when
you
train
a
network
to
do
recognition
tasks.
B
It's
a
really
good
idea
to
pre-train
it
on
imagenet,
because
you
want
it
to
learn
just
a
bunch
of
reusable
learn,
a
bunch
of
information
that
is
generalizable.
B
You
want
it
to
kind
of
choose
a
really
good
basis
like
this
and
and
if
you
treated
this
as
a
pure
memory
problem,
where
you're
going
through
and
just
never
disrupting
your
existing
knowledge,
pre-training
on
imagenet
would
have
less
value,
I
wouldn't
say
no
value,
but
but
this
this
whole
disrupting
the
basis
thing
is
it.
It
provides
a
benefit.
It
gives
you
generalization.
F
Yeah
just
a
quick
comment,
I
think,
there's
a
useful
distinction
to
be
made
there
for
generalizing
to
new
samples
within
a
class
that
I
think
kind
of
fits
the
picture
draw
with
the
decision
boundaries
on
top
and
generalizing
to
new
classes
that
you
haven't
seen
before.
So
those
are
two
different
problems,
one
of
them
you,
you
have
to
improve
your
your
basis
to
use
the
same
name.
You
use
there,
so
you
improve
your
basis.
F
So
when
new
samples
come
it's
easier
for
you
to
locate
within
that
reference
frame,
you
can
think
of
like
that
and
the
other
one
is.
You
have
a
completely
new
class
and
you
have
to
derive
new
basis
from
scratch,
and
then
how
do
you
do
that?
How
well
do
you
derive
this
new
basis
for
this
new
reference
frame.
B
I
would
say
no,
I
so
first
of
all
you're
using
the
word
reference
frame,
but
here
I'd
say
I'd
be
more
up
to
say
a
coordinate
frame
or
it's
it's
not
like
a
spatial
reference
frame.
It's
it's
more
like
the
coordinate
system
and
but
I
would
say
no
like
you
could
take
this-
that
has
been
trained
on
imagenet,
freeze,
most
of
the
network
and
just
use
a
introduce
a
whole
new
class
and
and
and
just
train
the
classifier
to
using
the
same
basis
like
yeah.
F
I'm
not
saying
you
can't
I'm
just
saying
those
are
two
different
problems
so
one
when
people
talk
about
generalization
in
machine
learning
and
sometimes
I
think
the
term
is
overloaded,
so
one
thing
is
just
generalizing
to
new
samples.
That's
what
you
expect
your
network
to
do
so
when
you
show
a
validation,
set
or
test
set.
It's
going
to
generalize
to
those
samples
that
never
seen
before
and
a
completely
different
thing.
It's
generalizing
to
new
class
and
that's
a
pre-training
example
you
gave
so
those
two
are
different
problems
and
the
the
generalization
term.
There
is.
B
C
Can
I
just
make
a
few
comments
about
this
before
we
go
on
yeah
yeah,
so
phil?
I
I
like
this
very
much.
I
think
it's
a
very.
I
mean
these
are
ideas
we've
had,
but
you
have
a
pretty
nice
terminology
on
it,
and
so
it
makes
it
more
concise,
just
a
couple
of
thoughts,
I'm
not
really
adding
anything
other
than
just
some
color
to
what
you
presented.
C
I
think
even
the
word.
Spatial
poolers
suggest
what
you're
saying
it's
pooling,
which
is
a
generalization
step.
It's
essentially
saying
I'm
putting
multiple
patterns
into
some
bucket
and
therefore
you
know
and
so
on.
Interesting
and
our
and
we've
talked
a
lot
about
how
you
go
back
and
forth
between
these
two
representations.
You
want
to
have
very
specific
memories
that
you
can
learn
quickly
and
yet
you
want
to
have
some
more
generalization
and
I
think,
there's
two
ways.
C
C
Statistics
of
the
many
inputs,
whereas
the
temporal
memory
works
best
when
it's
very
fast,
you
can
learn
in
one
step.
So
that's
one
way
these
things
differ
and
the
other
big
way
they
differ
is
that
in
our
spatial
cooler
we
come
up
with
this
basis
set,
but
on
its
own
it's
not
sufficient
to
recognize
anything
of
importance.
You
know
if
we
think
about
you
know.
C
Why
is
it
if
we
put
some
pattern
into
our
spatial
pool,
it's
never
sufficient
to
recognize
a
complete
object
or
an
image
or
a
sequence,
or
something
like
that,
and
so
we
use
we
use
time
to
do
that,
like
as
a
basis
set.
The
spatial
pooler
is
not
sufficient
to
recognize
virus
images
and
so
on
at
least
the
way
we've
implemented
it.
But
we
then
we
say
okay.
C
Well,
we
can
only
we're
sort
of
classifying
or
generalizing
a
subset
of
the
overall
pattern,
and
then
we
move
through
the
pattern
in
space,
either
through
time
or
physical
space,
sensory,
motor
and
and
that's
it's
another
way.
We
and
so
the
that's
another
way.
We
get
around
the
fact
that
our
spatial
cooler
is
very
impoverished
in
terms
of
a
basis
set
for,
instead
of
recognizing
lots
of
different
images.
So
I
I
just
put
a
little
color
on
that.
B
B
And
one
thing
I
one
thing:
I
realized
after
thinking
about
this
for
a
while,
so
I
used
to
have
this
in
my
head,
like
this
clear
line
like
what
you
can
either
do
memory
or
you
can
do
generalization,
but
the
reality
is
like
these
axes.
B
This
x1
x2
x3,
you
maybe
those
are
you
know
gabor
filters,
maybe
they're
something
resembling
that
in
a
sense
you're
as
you
learn
that
x1
the
x2
x3
you're
doing
something
a
lot
like
memory,
you're
doing,
you're,
you're,
you're
learning
that
these
things
co-occur
and
in
a
class
is.
C
Yeah
but
you're
not
learning
specific
things
right,
that's
yeah,
yeah!
If
that
those
x
y
and
z,
are
not
or
x
one
x,
two
x,
three
are
not
you're,
not
learning
something
specific,
you're,
basically,
learning
a
basis
set
by
which
you
will
then
later
memorize
something
yeah
yeah
and
in
our
world
our
basis
that
the
spaceship
puller
is
is
very
poor.
It's
a
very
limited
basis
that
I
I
don't
know,
but
I
imagine
traditional
neural
networks.
C
B
So,
in
the
way
I
frame
this,
it's
kind
of
like
on
the
left,
continual
learning
is
the
most
natural
thing
in
the
world.
You
just
do
it.
You
just
learn
new
stuff
and
put
it
on
new
dendrites
use.
Sparsity,
it's
it's!
It's
not
a
the
way.
We
frame
the
problem,
it's
no
longer
difficult,
but
it
doesn't
have
so
much
generalization
built
into
it.
B
Right
right
right,
except
for
what
we
get
kind
of
for
free
from
the
spatial
pooler,
yeah,
yeah
and
and
the
column
on
the
right
it
sort
of
was
designed
for
generalization.
But
then
it's
kind
of
horribly
bad
at
continual
learning,
continual
learning
we
didn't.
Even
almost
we
didn't
even
see
it
it's
a
problem
before
and
now
it
suddenly
is
a
problem.
So
how?
How
do
you
get
the
best
of
both
worlds?
B
I've
I
thought
that
there
so
this
this
is
the
the
presentation
I
had
composed
in
my
head
before
yesterday
now
now
that
I
understand
some,
this
approach
to
meta
learning
and
one-shot
learning,
few
shot,
learning
that
I'll
talk
about
in
a
second
and
two
bites
also
talked
about
it.
It
has
sort
of
turned
the
solution
space
around
on
on
me
and
made
me
kind
of
approach
this
a
little
bit
differently
so,
rather
than
rather
than
solving
this
directly.
B
Let's,
let's
put
the
problem
aside
for
a
few
minutes
and
talk
about
solving
a
different
problem,
few
shot,
learning
or
so
so
learning
a
class
from
just
a
few
examples:
getting
a
you,
you,
you
and
and
as
a
class,
because
we're
talking
about
classification
a
lot,
but
it
could
be
tasks.
It
could
be
some
kind
of
regression,
the
ability
to
infer
numbers.
You
can
talk
about
lots
of
things,
but
here
I'll
just
talk
about
classes,
because
we're
used
to
talking
about
that
right
now.
B
So
so,
given
just
a
few
examples,
can
you
quickly
learn
to
recognize
in
a
class
so
yeah?
Let's
talk
about
solutions
to
that
problem,
a
specific
solution
and
see
if
it
gives
us
a
new
perspective
on
continual
learning
so
so
this
brings
me
to
something
that
subutai
has
shown.
Sorry,
this
is
doing
weird
rotation,
so
something
subutai
has
shown
as
as
this
paper
that
talks
about
this
model
mammal
model,
agnostic,
metal
learning
and
the
picture
on
the
right
I
can.
I
can
just
describe
this
in
in
words.
B
What
they
do
is,
rather
than
training
a
network
to
perform
a
set
of
tasks
where
those
tasks
are
say,
recognizing
a
coffee
cup
or
discriminating
it
from
other
objects,
rather
than
learning
to
do
that.
What
they
do
is
they
allow
the
network
almost
it.
You
could
say
almost
at
inference
time,
the
the
the
system
is
allowed
to
learn
much
more
often,
it
can
learn
once
deployed
in
the
field.
B
So
so
what
I've
shown
here
is
that,
like
the
network,
sits
at
some
position
this
this
theta,
where
you
see
it's
pointing
at
a
dot-
and
it
learns
that
slowly
then
from
task
to
task,
as
this
network
is
doing
different
things,
it
very
quickly
learns
to
jump
to
theta
one
jump
to
theta
two
start
with
the
asterisk,
and
the
interesting
thing
is
after
that,
task
is
complete.
B
I'll
just
say
that
again,
because
I
wrote
it
down
here,
so
imagine
a
neural
network
that
sits
at
a
position
theta
when
I
say
position,
I'm
talking
about
all
the
weights
in
the
network.
Oh
whatever
types
of
parameters,
it
has
all
the
kind.
C
B
C
This
is
not
an
activation
state.
This
is
like
no
network
connectivity.
D
G
C
G
B
Yeah.
Okay,
but
I
mean
this
is
how
they've
framed
the
problem
yeah,
that
you.
C
B
Yes,
so
that
so
it's
like
the
the
network
is
trained
over
a
long
time
over
many
different
tasks
to
find
a
useful
theta
and-
and
this
might
sound,
abstract
or
strange,
but
think
of
this,
for
example,
as
theta
being
a
system
that
understands
the
system.
The
visual
statistics
of
the
world
that
that
could
be
what
theta
is.
C
B
Yes,
so
so
that
part's
analogous
the
spatial
puller,
but
then
the
part
where
that
takes
a
leap
and
does
something
new
is,
is
that
the
system
then,
from
this
point,
learn
something
temporary.
You
could,
like
I
mean
one
thing:
you'll
bring
up,
occasionally
is
silent.
B
Synapses
you'll
bring
up
how
the
cortex
has
capabilities
to
quickly
learn
a
thing
that
might
just
be
temporary
and
is
going
to
go
away
eventually,
if
you
bring
that
into
the
picture
where
you
have
the
system
that
is
at
this
position,
theta
and
then,
given
a
few
training
examples,
it
can
that
that
circuit
can
suddenly
become
a
classifier
for
coffee
cups.
It
can
suddenly
become
a.
I
don't
know
a
navigator
to
your
bedroom.
B
This
is
this
is
the
new
trick.
The
new
trick
is
being
able
to
learn
something
quick
on
the
fly
and
the
network
is.
Has
it
specifically
chosen
this
out
of
weight
to
chosen
a
connectivity
where
just
a
little
bit
of
quick
learning,
even
if
that
quick
learning
is
a
classifier
up
top
quickly,
just
learning
a
new,
a
new
pattern?
B
This
is
the
new
trick,
so
so
the
new
way
of
framing
it-
you
set
it
to
useful
theta
and
then
make
these
quick
updates
use
those
updates
for
a
little
while
then
throw
them
out
when
you
don't
need
them
anymore.
Can
you.
A
B
Yeah
yeah
the
the
silent
synapses
just
some
mechanism
where,
where
the
system
quickly
modifies
itself
whether
it's
turning
off
certain
dendrites
segments,
turning
off
certain
synapses
enabling
others
just
something,
that's
quick
that
is
intentionally
just
used
on
the
fly
that
you
could
in
the
course
of
a
couple
seconds.
Just
rely
on.
C
I
mean
our
you
know:
we
model
plastic
synapses
with
the
with
the
permanence,
and
that
is
sufficient
from
a
modeling
point
of
view,
but
from
a
biology
point
of
view,
it's
not
fast
enough
because
synapses
can
grow
in
maybe
in
an
hour,
but
then
they
can't
do
it
in
a
second
and
that's
where
the
idea
of
the
biology
has
has
a
silent
synapses
which
could
get
around
that
problem.
From
a
modeling
point
of
view,
you
don't
need
that
that's
more
of
a
biological
constraint!
Silence
synapse,
is
just
a
it's
just
a
synapse
of
zero
permanence.
G
Yeah
yeah,
that
could
be.
Let
me
drop
another
parallel
marcus.
I
think
the
way
you're
phrasing
it
and
kevin's
question
kind
of
reminded
me.
So
when
florian
was
here,
he
would
talk
a
lot
about
short-term
plasticity.
G
Like
really
quick,
you
know
within
seconds
or
milliseconds
kind
of
plasticity
and
then
there's
also
long-term
plasticity
which
happens
over
you
know
longer
term,
and
so
you
know
perhaps
you
know
these.
These
updates
here
are
kind
of,
like
you
know
the
short-term
plasticity
stuff,
and
then
you
know
this
is
a
much
slower
kind
of
long-term
plasticity.
G
So
we
make
you
know,
lots
of.
Let
me
see
if
I
can
just
you
know
lots
of
lots
of
quick
updates
that
don't
last
very
long,
but
those
are
kind
of
averaged
over
time
and
then
but
there's
there's
some
sort
of
memory
of
those
changes
and
then
the
long-term
plasticity
is
sort
of
making
those
kind
of
bigger
jumps
like
this.
Anyway.
I
thought
that
was
a
interesting
that
I
thought
just
just
occurred.
G
C
Was
when
florin
is
here
when
he
talked
about
that
very
short-term
plasticity,
I
think
he
was
talking
about
changes
at
the
synapse.
Is
that
right,
like
metatropic
changes,
yeah
yeah.
C
That
fits
into
the
whole
idea
of
silence
in
absence,
I
think
it's
like
you've
got
the
synapse
and
there's
a
chemical
change
can
occur
at
the
synapse
very
rapidly
that
that
could
make
the
synapse
change
from
being
nothing
to
something
or
from
something
to
more
or
something
like
that.
I'm
just.
G
Yes,
a
silent
synopsis
could
be
an
extreme
version
of
this
where
the
synapse,
you
know,
really
isn't
doing
anything
until
it
learns
this
pattern.
I
think,
for.
G
C
A
Yeah,
I
I
think
what
I
was
trying
to
introduce
plasticity
was
was
trying
to
relate
it
to
that
there.
You
have
different.
You
know,
phenomena
here
and
relating
to
different.
You
know
biological
mechanisms,
even
if
we
might
not
know
all
those
mechanisms
involved,
but
it's
a
that's
that's
kind
of
what
I
was
kind
of
shooting
for
there.
C
So,
but
as
long
as
we
can
say,
yeah
biology
can
do
these
things.
Then
we
can
just
we
can
sort
of
ignore
exactly
how
biology
does
them.
That
was
a
big
part
by
the
way.
To
me,
that
was
a
huge
going
back
decades
is
something
bothered
me
for
a
long
time
about
learning
and
plasticity
and
and
how
you
know
how
we
can
learn
really
quickly,
because
you
know
historically
again
going
back
decades.
B
Okay,
so
from
here
now,
I'm
gonna
talk
about
continual
learning
again,
so
so
what
I've
shown
here
is
a
way
of
framing
the
problem
of
one-shot
learning
or
a
few
shot
learning.
B
This
does
not
solve
continual
learning
in
its
current
state.
It's
it
will
still,
let's
see
if
you
learned
to
classify
one
object
from
another
yesterday,
and
then
you
threw
away
those
the
specific
weights
for
that.
You
haven't
solved
the
continual
learning
problem,
but
I
think
we're
just
like
a
quick
stone's
throw
away
from
solving
it.
If
you
had
a
really
good
mechanism
for
this,
it
suggests
a
really
fun
trick
for
for
solving
continual
learning
and
using
both
the
memory
and
generalization
tricks.
So
oops.
E
B
I'll
say
the
the
rest
of
this
was
like
yeah,
so
we
already
talked
about
this.
The
network
is
getting
better
and
better
at
fusha
learning
over
time.
So
if
we
have
a
quick
few
shot
learning
mechanism,
does
it
give
us
a
new
perspective
on
solving
continual
learning
with
generalization?
B
Well,
I
think
the
natural
solution
is
well
use
your
memory
to
store
a
small
set
of
instructive
examples
of
of
each
class
and
maybe
two
in
those
examples
to
be
really
useful
and
then,
whenever
you
need
to
go
and
recognize
x
or
perform
that
task
again
recall
those
few
examples
train
your
fuchsia
learning
network
on
those
examples
and
then
suddenly
you
have
a
model
again.
B
C
G
Replay
mechanisms
yeah
it's
along.
F
So
marx,
I
don't
know
if
you're
going
to
talk
about
learning
without
forgetting
and
prototypical
networks
or.
B
Yeah
I
mean
this
is
this:
is
the
literally
the
end
of
the
stuff
I'm
talking
about?
So
I
was
I'm
I'm
let's
see
here,
I'm
coming
in
from
the
angle
of
discussing
these
in
terms
of,
could
the
brain
be
doing
this?
Does
this
sound
plausible?
So
so
I'm
sure
people
have
done
this
in
the
machine
learning
world
and
you're
you're,
giving
an
example
of
that.
F
Yeah,
I
was
just
gonna,
add,
then,
to
use
for
reference.
So,
from
the
continued
learning
perspective,
there
is
a
classic
argument
called
learning
without
forgetting
and
that's
basically
the
idea
you
store
like
a
few
instructive
examples
and
yeah
and
every
time
you
see
new
examples,
you
see
if
it's
better
than
the
ones
you
have
and.
D
D
F
Idea
is
very
much
exploring
the
meta
learning
scenario
as
well,
so
the
prototypical
networks
or
the
whole
field
of
metric
learning
kind
of
explored.
That
idea
that
maybe
the
best
way
is
just
start
a
few
examples,
and
then,
when
you
find
something,
and
you
just
look
at
the
examples
you
have
store
and
then
look
at
which
one
is
closer,
like
a
k,
n
kind
of
thing.
B
Well,
so,
okay,
it's
like
k,
n,
except,
but
but
it's
not
I'm
not
I'm
not
talking
about
doing
anything
like
k,
nearest
neighbors,
I'm
talking
about
you
have
this.
The
two
are
kind
of
inseparable.
You
have
this
this
theta
this
basis.
You
have
this
this
this
thing
that
is
really
good
at
fu,
shot
learning
you
train
it
with
those
examples
and-
and
then
you
use
that
so
it's
a
little
different
from
k,
n.
F
Well,
they
didn't
say
it's
k
n
again
like,
but
I
I
I
kind
of
missed
now,
so
why
do
you
start?
Why
do
you
need
to
start
example
so
keep
retraining
on
them?
Well,
because.
B
Okay,
so
this
this
is
the
the
whole
trick.
Is
we're
willing
to
just
forget
stuff,
all
the
time
we're
we're
not
trying
to
keep
the
model
around
we're
not
trying.
B
This
is
the
cool
trick
to,
in
my
mind,
that
if
you'd
asked
me
before
yesterday,
how
I
think
all
this
happens,
I
would
have
said
that
we
must
store
useful
examples
and
then,
as
you
learn
throughout
the
day,
you
kind
of
learn
a
new
model,
and
then,
when
you
sleep
at
night,
somehow
all
your
old
examples
and
with
the
new
things
you
learned
are
somehow
like
merged
into
a
model
that
incorporates
both
of
them.
B
But
but
I
have
moved
away
from
that
with
this,
with
this
idea
of
doing
fast,
learning
like
fast
fu,
shot,
learning,
you're,
just
always
recreating
the
models
and
when
I
say
model
here,
I'm
talking
about
the
whole
ability
to
generalize
the
whole
circuit
that
takes
an
input
and
classifies
it.
G
B
Right
so
so,
because
we
know
we
can
do
fuchsia
learning
really
quickly,
and
so,
if
we
can
do
that,
what
what
else
can
you
do
once
you?
If
you
have
a
neural
circuit
that
can
do
that?
What
else
can
it
do
and
and
suddenly
it
suggests
a
what
different
way
of
framing
continual
learning
where,
where
you
do
keep
around
examples,
but
then
you
just
retrain
on
them
when
needed,
not
just
not
just
all
the
time
you
don't.
You
don't
go
to
sleep
and
retrain
on
everything
you
did
that
day.
B
You
throughout
your
day
when
you
need
to.
I
know
this
isn't
going
to
be
everything.
This
is
going
to
be
flawed,
but,
like
you
get
into
your
car
and
your
brain
quickly,
relearns
how
to
drive.
A
Marcus
when
you,
when
you
have
that
theta
in
and
you
you
train
to
this
stable,
semi-stable
operating
point,
if
you
go
up
one
more
level
of
abstraction,
you
could
presumably
have
different.
You
could
have
multiple
of
these.
These
theta
ends
lying
around,
depending
that
you
could
be
switching
between
right,
sure
yeah.
A
So
the
question
I
have
is
that
if
that
is
the
case
that
this
is,
this
is
a
kind
of
if
you
wish
a
configuration
space,
okay,
where
the
where
there
might
be
a
choice
of
you
know,
because
I
was
looking
at
this
trajectory.
I
was
first
I
was
thinking
well,
could
it
ever
branch
or
something
like
that,
then
I
I
was
thinking
that
if
there
would
be
something
that
could
activate
different
forms
of
these
stable
points
from
which
you
can
branch
off
of
its.
A
If,
if
you
wish
it's
a,
you
know
your
skills,
you
know
that
you
know,
and
you
have
this
theta
point-
that's
only
active
in
the
particular
context
and
then
from
there
you
can.
You
can
do
your
little
sub
thetas
off
of
there,
but
I'm
just
wondering
what
that
mechanism
would
kind
of
look
like
if
you
have
that
kind
of
richness
where,
where
there's
these
multiple
operating
points
and
then
you're
trying
to
find
the
one
that
matches
best
to
the
task
at
hand
or
something
like
that.
A
A
I
I
think
you
know
what
that
would
look
like
in
a
in
a
neural
network.
You
know
from
our
point
of
view,
you
know,
I'm
not
sure
what
it
maps
to
as
far
as
extant,
you
know
computable
neural
networks,
but
it
it
seems
like
it
could
be.
You
know,
given
what
your
your
insight
is,
that
can
be
a
very
rich
space.
B
Yeah,
so
the
general
idea
I
wanted
to
get
across
was
you
know
this
this
new
trick
of
being
able
to
make
these
quick
updates
and
and
and
does
that,
cause
us
to
reframe
how
we
think
of
how
you
solve
other
parts,
other
things
like
continual
learning,
because
in
my
mind,
that
solves
a
lot
of
this
trade-off
here
this
the
trade-off
between
memory
and
generalization
kind
of
goes
away
it
it
when
you
frame
it
this
way,
you
don't
really
have
this
trade-off
between
memory
and
generalization.
B
F
All
right,
because
I
just
go
back
a
little
bit
to
the
idea
of
learning
without
forgetting
and
it
might
be
useful
to
for
the
discussion.
F
So
the
idea
that
I
think
it's
very
similar
to
what
you're
discussing
you
might
correct
me
if
I'm
wrong,
but
so
you
keep
a
few
prototypical
examples
and
then,
when
you
learn
a
new
network,
you're
gonna
at
the
same
time,
you're
trying
to
learn
from
the
new
examples,
you're
trying
to
keep
your
output
stable,
refers
back
to
examples
you
had
in
the
past,
so
you
can
train
in
a
distillation
kind
of
way.
F
So
you
you
run
a
network
with
the
old
examples
and
you
don't
want
to
disturb
the
representation
you
had
for
these
old
examples.
So
you
keep
them
sourced.
You
can
keep
running
them
and
you
can
always
make
sure
that
you
keep
that
representation
stable
at
the
same
time,
you're
learning
new
classes.
So
you
have
this
combined.
B
See
the
thing
is
like
what
you're
describing
is
what
I
thought
until
yesterday
and
and
suddenly
my
big.
The
big
realization
is.
That
is
that
is
that
you
don't
really
need
to
bother
with
keeping
the
old
examples
up
to
date
or
deal.
You
don't
need
to
bother
with
keeping
the
neural
network
from
losing
the
old
examples.
B
You
still
you
keep
the
examples
around
the
examples
themselves
and
then
later
you
few
shot,
learn
your
model
again
on
demand,
but
but
the
core,
the
core,
let's
see
the
core
basis,
you're
using
may
have
changed
a
bunch
over
the
past
couple
weeks
of
doing
other
things.
You've
been,
you
know,
driving
around
you
went
on
vacation,
you
went
hiking,
etcetera
and
you
come
back.
You
have
this
totally
new
basis.
You
have
your
old
example.
Still
they
can
it
can.
B
You
can
still
retrain
over
the
course
of
a
couple
seconds
to
do
the
things
you
could
do
before
so
so,
like
it's
just
I'm,
I'm
sort
of
changing
the
the
timing
of
it
all
you're
and
I'm
changing
like
you're,
not
trying
to
keep
a
circuit
that
can
do
everything
you're,
keeping
a
circuit
that
can
quickly
learn
to
do
anything
and
a
set
of
examples
that
can
shift
it
in
those
two
directions.
A
One
of
the
things
that,
when
you
showed
that
operating
point,
I
think
of,
is
that
the
and
just
kind
of
relate
back
to
the
personal
experience
when
you,
you
learn
bad
habits
when
you
first
learn
something
and
you
don't
have
necessarily
the
best
instructor
and
then
you
struggle
at
a
certain
level.
A
You
can't
get
any
better
because
because
of
the
assumptions
built
into
you
know
how
you
learn
the
thing
in
the
first
place
and
it
takes
an
active
effort
to
to
school
yourself
to
you
know
you
get
a
different
set
of
instructions,
you
get
a
different
teacher
or
whatever
and
they'll
show
you
this
other
way
and
then
there's
a
struggle
point
where
you're
flipping
back
and
forth
between
the
two
ways
of
how
you
used
to
do
something.
I
think
in
terms
of
like
motor
skills
or
something
like
that.
A
But
I
think
it's
also
analogous
to
like
the
style
of
how
I've
you
know
programmed
you
know
has
changed
over
the
years
and
at
some
point
I
realized
that
okay,
this
is
holding
me
back
and
then
you
know
I'll
try
to
learn.
You
know
a
new
paradigm
for
something
and
then
I'll
gradually
school
myself
away
from
that,
but
I
I
think,
there's
these
this
notion
of
the
stable
operating
points.
Is
you
invest
a
lot
in
it,
which
is
why
you,
you
hate
to
have
to
learn
a
different
way
of
thinking
about
something.
A
But
you
know
you
somehow.
You
have
to
get
the
motivation
to
say:
okay
by
example,
someone
else
is
able
to
do
this,
a
lot
better
than
me.
So
what
is
it
that
they're
doing
and
then
kind
of
get
yourself?
You
know
schooled
over
to
that.
I
I
gotta
believe
that
there's
there's
there's
something
analogous
to
what
you're
showing
there
with
these
these
operating
points.
A
That
is
an
important
part
of
trading
off
between
the
ability
to
do
something,
fastly
and
something
and
then
going
through
the
effort
and
learning
how
to
do
something
better
so
that
you
can
go
on.
B
I
I
agree,
and
I
agree,
but
I
also
think
that
that,
if
let's
see
if
we
figured
out
how
to
do
what
you're
saying
where
you
have
these
multiple
operating
points,
it
may
still
be
the
case.
B
This
picture
captures
everything
because
theta
can
be
lots
of
things
I
mean,
and
these
leaps
from
theta
to
theta
one
star
might
consist
of
exactly
the
choice
you're
talking
about
it
might
be
consists
of
turning
off
two-thirds
of
the
of
the
possibilities,
so
that
this
picture
still
might
be
a
complete
picture,
but
I
could
translate
what
you're
saying
into
saying
that
this.
This
big
theta
here
actually
consists
of
like
it's
complicated.
It
has
multiple.
It
has
multiple
points
contained
into
it.
Yeah.
B
A
E
G
Yeah,
I
think
the
new
thing
I
think
the
new
thing
in
what
you're
suggesting
is
you
know
it?
You
know
like
you're
saying
I'm
sure
people
have
thought
about
similar
things,
but
you're
also
drawing
directly
the
connection
to
meta
learning
and
the
idea
that
you
would
actually
throw
away
this
local.
G
The
quick
changes
that
you
made
and
then
there's
some
other
sort
of
theta
star
version
of
the
network,
that's
kind
of
the
the
slowly
changing
network
that
you're
constantly
fine-tuning.
In
some
sense
I
guess
yeah,
but
any,
but
the
quick
stuff
that
you
learn.
You
learn,
but
it's
temporary.
You
know
you're
kind
of
throwing
it
away,
and
I
don't
know
if
people
have
really
done
something
like
that.
It
makes
a
lot
of
sense
to
me.
G
Yeah,
I
think
the
thing
is,
if
you
think
about
from
the
mammal
perspective
those
those
small
changes
that
are
thrown
away,
you
still
need
to
retain
some
memory
of
them
so
that
when
you
make
a
change,
it
still
makes
your
network
better
at
those
tasks.
G
But
you
don't
necessarily
want
those
exact
changes,
because
that
change
could
have
been
bad
for
something
if
it
was
bad
for
the
task
you
might
want
to
move
in
the
other
direction,
but
you
still
want
some
memory
of
that
around
in
some
sort
of
synaptic
traces
or
something
it
really
is
a
really
nice.
I
think
analogy
between
short-term
plasticity
and
long-term
plasticity
here.
Somehow.
A
Yeah,
I'm
funny.
I
was
just
writing
down
some
trace
lingering
when
you
said
that
yeah.
H
Can
I
try
to
offer
a
different
personal
example?
I
guess.
H
This
sort
of
like
fits
in
with
what
you're
thinking,
but
I
think
the
closest
thing
that
I
can
think
of
that
sort
of
gives
an
example
of
sort
of
that
quick
learning
that
you're
saying
is
like
trying
to
remember
a
name
where
you
actually
can't
remember
the
name,
but
you
just
kind
of
keep
on
throwing
names
at
it
until,
like
all
of
a
sudden
you're
like
oh,
that's
it,
and
then
it
kind
of
like
it's
as
if
I
don't
know
is
that
that's
unreasonable.
H
I
think
the
interesting
part
about
that
too
is.
It,
makes
me
kind
of
question
like
what
is
memorization,
because
it's
almost
as
if,
like
you
can't
recall
the
name
off
the
top
of
your
head,
you
have
all
the
internal
like
everything's
there
in
your
mind,
to
be
able
to
sort
of
play
with
the
blocks
enough
to
be
able
to
recognize
like
okay.
This
is
the
right
name
that
I
was
thinking
of
as
well
as
as,
if
you
have
the
the
the
target
somewhere
in
your
mind,
but
you
can't
actually
bring
it.
H
You
know
into
like.
A
It's
almost
like
you
have
multiple
hash
paths
to
activate
that
particular
specific
memory.
You
know
there's
associations
right
so
so
the
you're
basically
saying
I
mean
there's
some
people
who
are
great
at
straight
memorization,
I'm
lousy
at
it.
I
I
have
to
do
it
associatively,
but
the
you
know
there's,
but
I
understand
what
you're
saying
is
that
well,
you
know
you
know
you're
being
blocked.
A
There's
you
know
some
names
standing
in
front
of
the
name
that
you
really
want
to
access,
and
so
then
you
try
to
go
around
it
and
try
strategies
and
just
to
activate
somehow
that
thing
that
pops
up
the
name
that
oh
okay,
because
once
you
hear
it
you
know,
then
there's
all
this
recognition
right:
okay,
yeah
yeah,
that
was
it!
Someone
tells
you
what
it
is.
So
it's
I
it's
I
I
I
I
agree.
It's
an
interesting
thought
experiment
to
see.
A
B
And
one
follow-up
jeff
at
one
point
asked:
how
do
you
store
the
examples?
What
does
that
even
mean?
Well
one
one
angle
on
this
would
be
some
of
the
tricks
that
we've
that
we
know
about
like.
Maybe
this
is
a
way
to
incorporate
like
memory
explicit
memory
networks
with
these
kind
of
explicit
generalization
networks
and
treating
them
as
working
together.
B
So
so
yeah
using
things
like
the
temporal
memory
and
all
the
ways
that
we
talked
about
creating
models
of
objects.
It
could
be
some
hybrid
of
these
two.
F
One
question
there
marcus,
so
I
understood
your
model
correctly,
then.
The
next
question,
if
you
ask-
and
I'm
curious
to
hear
your
perspective-
is
how
do
you
decide
which
examples
you
want
to
keep
and
I'm
not
assuming
you're
going
to
keep
them
at
the
input
space,
I'm
assuming
you're,
going
to
keep
them
at
some
data
in
space
or
whatever
it
can
be.
But
how
do
you
decide
which
examples
you
want
to
keep
which
are
the
prototypical
examples.
B
My
answer
is:
I
totally
agree
that
that's
one
of
the
next
logical
questions,
I
think
choosing
a
random
sub-step
will
work
and
then
you
can
definitely
do
better
than
random,
but
but
that's
that's
the
best.
I've
got
the
exact
way
you
go
about,
choosing
which
examples
you
want.
I
mean
this
this.
This
is
kind
of
like
some
classic
machine
learning
stuff,
like
pro
learning
prototypes
versus
learning.
Anyway,
the
point
is
keeping
an
instructive
set
of
examples
is
a
problem
that
has
been
studied
a
lot,
and
I
don't.
I
don't
past
that
point.
B
G
Argue
the
opposite,
too
yeah,
because
it's,
I
think
prototypes
might
be
helpful,
but
you
might
also
want
to
store
the
ones
that
are
right
at
the
boundaries,
so
the
ones
that
are
that
really
discriminate
between
a
cat
and
a
dog,
for
example,
or
an
apple
in
an
orange.
Just
it's
the
because
if
you
just
train
on
the
prototype
for
typical
examples,
you
can
move
the
decision
boundaries
quite
a
bit
and
make
lots
of
mistakes
and
still
classify
the
prototypes
correctly.
A
So
I'm
just
wondering
to
to
take
what
subita
just
said
if
the
theta
1
theta
2,
theta
3
define
a
kind
of
subspace
at
that
point.
So
if
you
have
the
extremal
examples
you
might
be
able
to,
that
might
be
the
most
fruitful
thing
to
say.
Okay,
I
need
to
play
around
within
this
subspace
and
then
you
know
to
learn
the
next
example
of
whatever
that
is,
so
the
extremal
examples
give
you
the
the
best,
discriminative
capability
of
or
excuse
me,
the
least,
literally
dependent
subspace
in
which
to
to
couch
whatever.
A
That
is,
I
mean,
obviously,
I'm
I'm
making
an
analogy
to
a
continuous
space.
You
know
to
something
that's
relatively
discreet,
but
you
know
I
I
I
I
still
like
this
kind
of
point
from
which
you
know
there
are
these
other
things
kind
of
spawn
off
and
once
you
get
to
there,
then
there's
a
secondary
thing
of
saying.
Okay,
so
I
learned
the
cat
and
the
dog,
and
someone
throws
a
raccoon
at
me.
A
That
so,
as
as
to
which
ones
were,
would
be
retained.
It
might
be
the
ones
that
share
the
least
characteristics
in
in
one
axis,
but
still
have
a
commonality.
Because
of
the
theta
point
that
they're,
starting
at.
A
I
mean
that
would
be
the
most
efficient.
If
you
know
I
don't
know
if
the
brain
is.
Is
that
efficient,
but
that
would
seem
like
the
the,
if
you're,
if
you're,
trying
to
compress
that
space.
That
would
that
would
be
your
your
best
option.
G
Yeah,
I
think
the
one
thing
again
going
forward
to
draw
the
analogy
to
mammal
is
that
you
also
have
to
have
a
notion
of
how
good
each
change
was
and
incorporate
that,
so
it
might
be
that
you
know
this
was
a
good
change.
This
is
a
good
change
and
this
works,
and
this
is
a
bad
change
right.
In
that
case,
what
you'd
want
to
move
with
the
direction
you
might
want
to
move
is
somewhere
like
this
right
away
from
the
red
and
and
more
towards
the
green.
A
Well,
I
mean
if,
if
the
point
is
to
try
to
recognize
something
that
you
know
the
points
of
of
feature
tangency,
you
know
that
okay,
well,
the
closest
that
you
know
when
you
say
you
know
is
this:
does
this
fall
into
this
particular
classification?
And
you
try
to
look
you
know?
The
simplest
thing
is
is
to
say,
okay,
how
many
shared
features
does
it
have
or
is
there
the
other
end
of
it?
Is
you
know,
is
there
some
kind
of
gestalt
global
thing
about
it?
A
H
That
this
is
more
of
a
like
a
supplementary
sort
of
mechanism
for
continuing.
I
guess
in
my
mind
I
just
think
about
like
I'm
wondering,
like
you
know,
I'm
not
every
single,
thankfully,
every
single
time
I
come
to
the
stand-up
meeting,
I'm
not
having
to
like
you
know.
Remember
like
oh,
you
know,
that's
marcus,
you
know
that'd
be
kind
of
you
know
mentally
painful,
but
I
could
see
this
being
something
of
like
a
like
more
supplementary
of.
Like
you
know,
I
haven't
played
soccer
in
like
four
years,
and
you
know
now.
H
I'm
gonna,
like
you,
know,
de-rust
a
bit.
You
know
the
sort
of
mental
skills
through
something
like
this.
Potentially
I.
B
I,
like,
let's
see
I
almost
wish
we
had
a
word
that
was
like
supplementary,
but
a
little
bit
makes
it
seem
a
little
bit
more
important
than
that.
To
me,
this
seems
like
a
key
mechanism
of
how
the
brain
is
doing,
what
it's
doing.
It's,
not
everything
and
I'm
like
I
in
some
ways,
when
I
talked
about
driving,
I
was
like
I
was
already
pushing
the
limits
of
like
okay.
B
F
F
They
apply
to
generalization
to
new
samples
because
you're
talking
about
storing
a
few
examples
of
classes,
you've
already
seen
and
then,
when
you
need
to
do
some
inference
at
that
same
task,
same
class
as
you've
seen
in
the
past,
then
you
can
just
replay
those
very
fast
and
then
adapt
again,
but
that's
only
applied
for
generalization
to
news
samples,
that's
the
kind
of
what
we
do
in
supervised
learning,
whereas
in
the
continual
learning
setting
we're
doing
generalization
to
new
classes,
and
then
this
doesn't
apply
it's
like
correct.
Well,
it
does.
B
The
the
the
previous
examples,
you've
stored
away,
don't
apply
correct,
but
this
theta
that
you've
learned
over
time
does
so
so
yes,
so
so,
yes
and
no
you're
correct
like
like
this.
This
general
approach
does
attack
that
problem,
but
you're
right
that
the
examples
you've
saved
away
don't
play
any
role
in
that.
C
Yeah,
I
have
a
few
things
sure,
so
I
first
of
all
I
apologize.
I
don't
understand
this
deeply
yet
I
mean,
obviously
you
guys
have
a
deeper
understanding
than
I
do
and
marcus
is
sort
of
a
light
bulb
one
of
your
head
on
this
side.
That
hasn't
happened
with
me.
C
I
just
don't
understand
it
well
enough
yet,
but
I
do
have
but
listening
to
conversation,
it
reminded
me
of
the
way
that
I
think
I
know
how
brains
do
these
things
and-
and
this
may
be
in
addition
to
the
methods
that
I
know
about.
So
I'm
not
saying
this
is
raw,
I'm
just
saying,
there's
other
methods.
I
know.
E
C
C
Basically,
is
that
if
you
took
a
traditional
neural
network
and
you
sparsify
its
activations
and
instead
of
each
unit
or
neuron
having
one
set
of
synapses,
it
has
multiple
sets
of
synapses
you
can
think
of
as
dendrites,
then
you
can
do
continuous
learning
without
catastrophic,
forgetting
because
every
time
you
learn
something
new,
you
can
learn
it
very
quickly
and
if
you
do
it
on
an
existing
set,
a
new
set
of
a
new
dendrite
branch.
C
If
you
will,
then
you
all
you're
doing
is
you're
you're
changing
if
I
think
about
a
representation
that
represents
something
some
of
the
units
that
are
active.
Only
a
very
small
subset
of
those
units
that
are
active
would
have
learning
occurring
to
them
and
they
would
not
modify
the
one
dendrite
that
the
cell
already
had.
We
just
add
a
new
one,
and
so
the
cell
now
would
respond
to
two
different
things,
but
the
set
of
activations
really
wouldn't
be
impacted
at
all.
C
So
that's
the
idea
that
you
could,
by
combining
sparsity
of
activations
and
multiple
sets
of
synapses
or
dendrites
on
each
neuron.
You
theory
should
be
able
to
do
continuous
learning
on
a
traditional
neural
network.
I
I
I
don't
know
if
I
phrase
that
right,
if
that's,
but
I
think
that's
what
you
were
talking
about
on
mondays.
G
Yeah,
that's
part
of
it.
Yeah.
C
Now,
in
terms
of
generalization
that
doesn't
give
you
any
generalization-
and
I
just
want
to
remind
you
that
there's
there
is
a
couple
forms
of
generalization
that
we've
kind
of
deduced
or
known
that
the
neocortex
does
one
is.
We
haven't
modeled
at
all
and
we
haven't
put
in
any
of
our
papers,
but
we've
talked
about
it.
I
just
want
to
remind
you
what
it
is
in
in
our
cortical
model,
the
htm
model,
the
an
object
is
represented
by
an
arrangement
of
other
objects
in
some
framework,
and
so
it's
like
a
reference
frame.
C
That's
populated
with
other
reference
names,
because
each
represent
represents
an
object.
That
was
the
idea
of
anyway.
That's
that's!
That's
the
general
idea
and
when
you,
when
you're
presented
with
a
new
object,
you
don't
know
what
it
is.
What
you
do
you
can
observe
yourself
doing
this?
C
Is
you
attend
to
different
features
of
the
new
object
and,
as
you
attend
to
different
features,
you're,
essentially,
building
you're
you're,
looking
at
a
subset
of
the
components
of
that
object
and
that
subset
could
be
shared
with
a
previously
learned
object?
So
if
I'm
looking
at
an
object
that
has
10
features,
I
look
at
five
of
those
features.
I
could
say:
oh
those
five
features
will
arrange
similarly
to
how
I've
seen
it
in
a
car,
so
this
might
be
like
a
car
or
and
another
subset
of
five
features
might
be
similar
to
us.
C
I've
seen
in
a
desk-
and
there
I
say:
oh
maybe
this
is
similar
to
a
desk,
but
it's
an
active
attentional
process
where
you're
we're
moving
point
to
point
attention
tending
to
different
features
and
and
that's
what
we
do
when
we
don't
understand
something.
C
We
look
around
to
see
an
arrangement
of
some
arrangement
of
the
sub
components
that
are
similar
to
the
arrangement
of
those
components
in
another
object,
and
that's,
I
think,
a
really
powerful
form
of
generalization
is
what
we
do
when
we
don't
know
what
something
is
we
look
around
and
attend
the
different
features.
We
say:
oh,
it's
going
to
be
like
a
cat
because
it
has
ears
like
a
cat
and
it's
paws
like
a
cat.
Even
though
it's
tail,
it's
not
like
a
cat.
Something
like
that.
C
So
I
think
that's
the.
I
think.
That's
what's
going
on
the
brain,
that's
the
most
powerful
form
of
generalization.
That's
going
on
in
the
brain.
We've
also
talked
about
other
forms
of
generalization
in
terms
of
scale
and
variance
in
time
tempo
environments,
but
I
think
those
are
minor
components.
So,
ultimately,
I
think
we
want
to
get
to
a
truly
really
generalizable
system.
That
says
I
can
look
at
something
new
and
figure
out
what
this
thing
is
and
figure
out
how
it
works
and
guess
that
how
I
should
interact
with
it.
C
C
But
that
gives
you
generalization
that
gives
you
generalization
as
well
as
continuous
learning.
Is
that
right.
B
F
So
jeff,
I
have
a
question
on
what
you
just
said
so
say
you,
you
see
something
entirely
new
and
then
you
know
like
a
subset
of
the
features,
looks
like
a
dog
and
the
subset
of
the
features
looks
like
a
cat,
but
it's
not
really
a
dog
and
it's
not
really
a
cat.
So
how
do?
How
do
you
see
that
forming
up
so
would
you
would
we
make
a
new
reference
frame
for
this
new
thing
that
we
don't
really
know
what
it
is,
but
yeah
yeah?
You
would.
C
Well,
I
think
that's
the
question
of
how
quickly
and
permanently
do
you
memorize
the
new
thing.
So
as
I
wrote
about
in
and
we
talked
and
I
wrote
about
the
book,
I
think
when
you're
going
around
the
world
moment
to
moment
every
day,
every
minute
you're
doing
something
you're
attending
the
different
objects
continuously
multiple
times
a
second-
and
I
think
marcos
is
the
first
one
to
include
me
into
this,
but
I
think
when
you're
constantly
building
a
model
of
everything
you
see
it's
even
if
it's
a
temporary
model.
C
The
example
I
use
in
the
book
is
like
you
look
at
the
dining
room
table
and
you
just
glance
around
the
table
and
you
built
the
model
where
the
potatoes
are
where
the
green
beans
are,
where
your
water
glass
is
and
so
on,
and
you
can
act
on
that
model
immediately
because
in
some
sense,
you're
constantly
learning
everything
all
the
time,
but
that
model
most
of
that
learning
will
fade
and
so
a
little
bit
later
in
the
day,
I
won't
remember
where
my
water
class
was.
You
know
it
doesn't.
C
So
I
think
you
can,
you
can
continually
learn
one-shot
learning
all
the
time
this
way,
but
you,
but
you
could
forget
things
quickly
because
many
things
in
the
world
or
you
don't
need
to
memorize
everything
all
the
time.
But
if
I,
if
I
continue,
went
back
to
my
dining
room
table
and
the
potatoes
were
always
in
the
same
spot,
every
single
night,
that
would
reinforce
that
and
eventually
I
would
just
learn-
that's
where
potatoes
are,
but
otherwise
I
would
sort
of
forget
it
until
the
next
time
and
then
yeah.
C
So
there
must
be
some
sort
of
trace
that
continues
on.
But
but
I
I
don't
I'm
trying
to
remember
what
you
actually.
Your
original
question
was
lucas,
but
I
I
think
you
can
learn
continuously
one
shot
all
the
time
and
just
forget
things
that
aren't
repeated
in
the
sense
of
that
permanence
type
of
thing.
Did
I
answer
that
question.
I
forgot
what
your
question
was.
F
C
You'll,
learn
that
thing
immediately
you'll,
just
it's
like
learning.
It's
like
walking
into
your
dining
room
and
seeing
a
new
arrangement
of
dishes
you'll,
learn
it
immediately.
It's
and
yes,
I
think
you
would
learn
it
immediately.
It's
just
it
wouldn't
be
permanent.
C
I
think
in
terms
of
our
research
agenda
here
right,
I
know
we
can
do
this
continue
something
with
the
dendrite
stuff.
Maybe
the
stuff
you
talk
about
here
today,
marcus
could
fold
right
into
it
too.
I
don't
understand
it
well
enough.
I
think
the
generalization
component
I
just
talked
about
it's
gonna.
It's
gonna
require
reference
frames
and
displacements
and
attention
and
such
a
a
bigger
thing
to
bite
off,
and
we
probably
won't
be
able
to
do
that
right
now.
Maybe
in
a
bit.
G
D
C
Yes,
but
on,
but
on
different
dendrites,
okay,
so
unit
a,
I
think,
you
know,
neuron
a
and
neuron
b
can
be
together
representing
dog.
They
could
also
be
together
represent
car,
but
the
connections,
the
the
the
car
representation
and
the
dog
representation
would
be
used
different
dendrite
segments.
They
wouldn't
be
on
the
same
dendrites.
The
whole
point
is
to
separate
out
the
space
on
different
dendrites.
C
But
the
number
of
units
and
the
sparsity
of
activation
would
always
be
the
same,
but
any
two
units,
if
let's
say
if
a
a
neuron
was
active,
one
percent
of
the
time-
that's
we
have
you
know
one
percent
activation
and
sparsity.
C
Then
two
particular
neurons
would
be
co-active
every
ten
thousand
times,
and
so,
if
I
learned
thirty
thousand
things
and
on
average
they'd
be
they,
those
two
units
would
be
shared
in
three
things.
Obviously,
if
the
numbers
go
up
to
four
or
five
percent,
then
it's
it
gets.
C
You
know
then
it'll
be
40
and
you
know
400,
that's
multiple
numbers,
but
but
yes
you
would
it's
just
that
two
units
would
not
be
co-activated
very
often
for
different
objects,
but-
and
even
then,
if
I
you
know-
let's
say
in
a
brain,
I
might
have
20
or
40
of
these
units
active
at
once,
even
if
two
or
three
or
four
are
confusingly
collective
because
they
co-active
in
some
other
pattern,
the
entire
20
or
40
neurons,
which
is
still
very
unique.
C
It
seems
to
me
well,
I
know
it'd
be
good
for
continuous
learning,
because
it's
essentially
you
can
do
it.
It's
more
like
one-shot
learning,
all
the
time.
It's
not
like
you're
modifying
all
the
synapses
continuously
you're
just
continuously
adding
new
dendrites.
You
know
when
necessary,
but
I
don't
see
how
that
gives
you
generalization
and
I'm
missing
something
here,
it's
more
along
the
lines
of
it.
C
What
the
marker
started
with
the
temporal
memory
is
an
example
of
a
very
fast
learning
mechanism,
memory
mechanism,
but
it
doesn't
generalize
at
all,
and
so
I
I
mean
you
know
I'd
speak
carefully.
If
I
said
you
take
a
traditional
neural
network
and
you
do
it
this
way,
it's
not
every
single
training
example
is
going
to
have
a
new
dendrite.
That's
all
we're
talking
about
here,
but
it
would
be
more
like
I
don't
actually
don't
know
the
rules.
Maybe
it's
the
entire
park.
C
If
somebody
could
play
it,
but
you
know
most
of
the
time
we
would
be
modifying
the
synapses
in
an
existing
segment
like
oh,
this
is
another
cat.
This
is
another
cat,
there's
another
cat
whatever,
but
here's
a
new
thing
completely.
Let's
form
a
new,
it's
it's
not
close
to
something
we've
seen
before,
let's
form
a
new
dendrite
segment
to
learn
it.
That
kind
of
thing.
D
G
It
was
kind
of
switching
gears
a
little
bit,
okay,
okay,
so
so
you,
you
reviewed
the
neil
burgess
paper
and
andre
buchansky
paper
on
monday,
so
both
of
them
replied
on
twitter
because
we
we
tagged
them.
So
I
thought
I'd.
It's
probably.
I
think
it's
worth
you
looking
at
there
is
they
referenced
a
couple
of
papers
but
I'll
just
share
it
on
the
screen.
G
Briefly,
for
you
and
I'll
send
you
the
link
there,
so
we
had
mentioned
our
research
meeting
here
in
the
on
twitter
and
andre
actually
and
did
several
replies
with
a
bunch
of
details
in
here
which
might
be
interesting.
So
it's
like
nice
discussion
of
our
recent
review.
A
couple
of
quick
takes
with
distal
visual
cues
head
direction,
needs
no
ego,
aloe
transform,
that's
a
feature
of
most
head
direction
models
and
the
point
is
explicitly
made
in
in
this
other
paper.
C
While
I
was
reading
the
review
paper,
I
did
do
a
little
research
on
how
people
thought
head
direction.
Cells
came
about,
and
I
don't
know
if
I
I
don't
know
if
I
ran
across
their
particular
paper
about
that,
but
the
ones
I
did
look
like
I
I
felt
I
found
them
insufficient.
So
I
need
to
look
at
their
paper
and
see.
I
I
don't
know
how
you
don't
have
to
make
an
ego
out
a
transformer
I
mean.
G
Yeah
yeah
and
then
they
had
a
bunch
of
more
detailed
comments.
I'll
send
you
the
link
on
slack.
C
I
hope
they,
I
hope
they
weren't.
They
were
not
unhappy
with
my
review.
G
He
also
kind
of
retweeted
this,
and
he
said
nice
discussion
great
that
you
put
your
journal
clubs
online.
So
I
think
he
appreciated
that
and
then
he
pointed
out
a
couple
of
papers
on
integration
of
reference
frames
in
neocortex.
So
I
think
we've
discussed
some
version
of
these
before
one
of
them.
I
think
it's
this
one
basically
says:
there's
lots
of
different
reference
frames.
It's
not
like
there's
one
reference
around
the
neocortex,
but
I
think
we're
taking
that
to
a
whole
different
level
with
every
cortical
column
having
its
own
reference
frames.
B
B
C
C
I
I
well,
I
appreciate
this
and
you
know
I'm
always
nervous.
You
know
reviewing
people's
papers,
you
might
screw
it
up
somehow,
so
I'm
good,
they
didn't
think
yeah.
G
C
I
would
like
you
know
I,
if
they
listening
to
this
by
any
chance
I
would
like
to.
I
would
like
to
do
this,
but,
as
I
mentioned
in
our
stand-up
meeting
today,
I
am
not
going
to
do
anything
at
all
until
I
get
this
next
round
of
the
book
done
so.
D
G
Yeah,
no,
no,
I
think
I
think,
I'm
sure
speaking
for
them.
They
would
be
fine
with
that
they're.
Not.
I
think
they're
they're
happy
we're
really
looking
into
these
in
a
lot
of
detail.
C
Yeah
I
appreciate
that
and-
and
I
appreciate
them
engaging
with
us-
so
that's
great-
I
just
I'm
frustrated
actually,
because
I
can't
really
do
a
bunch
of
things
I
want
to
do
right
now,
so
I
have
to
just
I'm
out
of
deadline
and
I'm
like
slave
to
the
deadline.
So
maybe
maybe
we
could
put.
Maybe
someone
can
remind
me
next
wednesday
say
this
again
on
wednesday
next
week,
but
because.