►
From YouTube: AI / Neuroscience Chat - Catastrophic Forgetting
Description
Broadcasted live on Twitch -- Watch live at https://www.twitch.tv/rhyolight_
A
A
A
So
the
topic
today
is
is
sort
of
in
the
AI
machine
learning
realm.
So
so
this
is
in
the
context
of
machine
learning,
not
neuroscience
I
would
say
so,
and
the
topic
is
catastrophic,
forgetting
or
as
wikipedia
tells
me.
It
can
also
be
called
catastrophic
interference
which
I
haven't
heard
before,
but
it's
the
tendency
of
an
artificial
neural
network
and
when
they
say
that's
always
defined
as
saying
that
it
applies
to
artificial
neural
networks.
A
I
would
dispute
that
it's
not
all
artificial,
neural
networks
that
are
prone
to
catastrophic,
forgetting
they're
talking
about
deep
learning
networks,
anything
based
upon
the.
What
would
we
call
it?
They
point
neuron,
that's
doing
primarily
spatial
pattern.
Recognition
and
classification
is
susceptible
to
this
idea
of
catastrophic,
forgetting
because,
let's
look,
let's
look
at.
A
A
A
Diagram
isn't
it
so
you
can
really
sort
of
understand
how
a
lot
of
these
different
ideas
are
structured
of
convolutional,
neural
networks.
Here's
a
deep,
convolutional
neural
network,
oops
wrong
one.
No,
this
guy
and
you
can.
You
have
to
imagine
having
lots
and
lots
of
layers
like
you.
Can
you
can
sort
of
take
any
node
in
here
between
the
input
and
the
output
and
expand
on
it?
A
A
C
A
Mark
says
yes,
so
I'm
waiting
for
a
blurb
of
text,
yes,
comma
anyway,
what
these
end
up
with
when
you,
when
you
get
when
the
information
is
passed,
all
the
way
to
the
right
is,
you
know
for
an
input
once
this
network
is
learned
because
the
the
model
is
in
the
weights
between
all
of
these
nodes,
and
once
that
has
learned,
it
has
a
representation
of
the
of
some
some
input
space.
Here.
A
You
know
that
has
been
learned
up
over
time
in
in
all
these
connections,
so
that
when
you
give
it
a
new
input
that
will
classify
it
and
a
certain
certain
bits
will
activate
as
an
output.
You
know,
so
the
problem
with
catastrophic
forgetting
is:
let's
look
at
the
the
death,
the
Wikipedia
definition
and
then
expand
upon
that.
So
it's
a
tendency
of
an
a
an
and
like
this
to
completely
an
abrupt
ly.
Forget
previously
learned
information
upon
learning
new
information.
So
so
we
have
to
talk
about
learning.
B
A
Application
of
back
propagation
of
error,
you
know-
and
you
do,
that
in
like
batches
right,
you'll,
you'll,
you'll
pass.
You
know
basically,
like
average,
a
bunch
of
stuff,
a
bunch
of
input
together
and
process
them
all
at
the
same
time
and
then
run
back
propagation
of
error
across
the
whole
structure.
B
A
A
So
they're,
basically
laying
out
the
problem
here
that
neural
networks
aren't
generally
capable
of
learning
tasks
in
a
sequential
fashion.
So
that's
a
good
point,
a
good
thing
to
point
out
right
off
the
bat.
The
ability
to
learn
tasks
in
a
sequential
fashion
is
crucial
to
the
development
of
artificial
intelligence.
I
would
absolutely
agree
to
that
and
that
so
I
have
to
point
out
here.
This
is
the
big
difference
between
HTM
and
and
deep
learning.
Htm'
always.
B
A
In
sequential
fashion,
that's
what
it
like
does
that's
what
evolution
made
it
do
so
we're
trying
to
reverse-engineer
how
we
understand
the
sequence
memory.
The
sequential
part
of
this
is
done
in
the
brain.
I
think
that
we
have
a
good
understanding
of
that.
That's
what
HTM
theory
is
all
about.
So
right
off
the
bat
you
can
see
that
the
eye
hole,
the
whole
problem
of
catastrophic
forgetting,
doesn't
really
quite
apply
to
the
biological
idea
of
intelligence,
as
we
define
intelligence
right.
A
Apply
to
your
deep
learning
networks
called
elastic
weight,
consolidation
and
I.
Don't
I,
didn't
I,
don't
know
exactly
what
I
don't
understand
the
math
here,
but
but
it's
a
way
that
you
can
try
and
counter
that
catastrophic,
forgetting
happens
naturally
in
deep
learning
networks
so
that
you
won't
catastrophic.
We
forget
something
that
was
important
in
one
of
the
previous
batches,
so
there's
already
methods
in
place
for
deep
learning
networks
so
that
you
can
prevent
this
from
happening
the-
and
this
is
probably
the
major
paper
you
should
read.
If
you
want
to
learn
about
that.
A
A
A
That
there's
no
it's
not
I
mean
yeah.
It
can
be
affected
by
catastrophic,
forgetting
just
like
any
other
biological
system,
but
I
wouldn't
call
it
catastrophic
right.
I
mean
people
forget
yeah,
it
could
could
be
it.
There
is
a
bit
of
a
terminology
issue.
Biological
things
forget
all
the
time,
but
it's
not
a
catastrophic
thing.
It's
not
like.
You
forgot
everything.
You
learn
the
day
previous.
A
Every
new
day,
but
I
think
that
there's
what
Mark
pointed
pointed
out
here
is
that
we
don't
call
it
out
that,
but
the
but
the
properties
of
sparse
distributed
representations
are
very
resilient
to
what
causes
I.
Think
this
catastrophic
forgetting
or
at
least
the
the
idea
that,
when
you
learn
something
new
and
it's
like
something
you
learned
in
the
past,
you
don't
lose
that
you
know
you
don't
lose
that
that
link,
that
association
right,
hey,
Falco,
so
like
it's,
it's
almost
like
when
you
think
about
sparse,
distributed
representations
and
the
overlap
between
sparse
distributed
representations.
B
A
You
can
do
this
type
of
matching
with
representations.
The
way
the
brain
represents
information.
You
can
do
this
type
of
matching
so
that,
if
something
you
learn
something,
and
it's
like
something
that
you
learn
before
you
don't
lose
that.
So
you
build
on
what
you
have
learned.
So
there's
this.
The
idea
of
catastrophic
forgetting
doesn't
in
the
way
that
it's
described
in
detail
in
deep
learning
doesn't
apply.
I.
Think,
as
you
accumulate
knowledge
over
time,
you
know
accumulate
information
about
reality
over
time.
You're
not
just
going
to
catastrophic
ly.
A
A
B
A
A
A
A
A
A
It's
hard
to
compare
these
diagrams
to
like
an
HTM
structure,
because
we
have
to
talk
about
three
dimensional
three
dimensional.
Well,
not
really.
We
don't
need
to
talk
about
three
dimensions
a
bit.
It
makes
sense
to
think
about
a
layer
of
cells
that
aren't
just
sort
of
one-dimensional
like
these
are
like
this.
This
being
they
call.
This
we've
called
this
one
layer,
and
this
is
one
unit
right
of
the
layer
and
we
would
build
a
structure
for
the
spatial
Pooler
to
operate
within.
That's
like
so
many
cells
tall.
A
And
that
doesn't
happen,
I
don't
think
in
these.
In
these
networks,
there's
you
could
say
convolution
attempts
to
do
that
and
no
and
that
it
will
take
a
picture
and
break
it
up,
pick
an
image
and
break
it
up
into
parts
and
have
dedicated
layers
with
units
that
are
processing
features
in
each
one
of
those
and
then
you
know
filter
them
up
through
a
bunch
of
convolving
layers
that
try
and
capture
those
groups
and
groups
of
features.
A
But
it's
not
the
same.
It's
not
the
same
thing
as
what
we
were
talking
about,
and
spatial
pooling,
although
it
I
think
you
know,
the
idea
of
many
columns,
I
think
is,
is
maybe
an
originator.
Well,
I
think
this,
the
convolutional
neural
networks
were
came
from
the
Whoville
and
ISIL
stuff.
Most
likely
does.
A
That
deep
learning
is
on
learning
and
batches
and
HTM,
as
continuous
learning
has
an
influence.
Yes
yeah.
Absolutely.
The
batch
thing
is
super
important,
because
I
think
I
did
talk
about
this
a
little
before
before
he
joined,
probably
that
you
have
to
apply
the
back
propagation
of
error,
algorithm
sort
of
all
at
once
across
the
whole
network.
So
you
you
have
to
sort
of
stop
the
system
and
do
this
big,
expensive
calculation.
A
Time
there
it's
not
that
the
one
of
those
images
has
any
relation
in
time
to
any
of
the
others,
but
you
put
them
all
in
the
same
batch
and
you
process
them
all
at
once
and
when
you
apply
the
back
propagation
of
error
algorithm,
then
you
update
all
your
weights
for
and
they're
sort
of
tuned
to
things
in
that
data.
That
is
trying
to
minimize
error
for
classification,
usually
now
that
batch
might
have
different
characteristics
than
the
next
batch.
A
Next
batch
comes
and
there's
subtle
differences
in
it,
but
you
basically
do
the
same
thing
you
process
all
at
once,
and
then
you
apply
the
back
propagation
of
air
algorithm,
trying
to
mitigate
the
loss
for
these
loss
functions
and
that
sort
of
tunes
again
all
of
the
weights
in
the
network
to
perform
best
for
the
epic
of
data
that
you
just
processed,
which
could
remove
the
patterns
from
the
last
epoch
and
sort
of
overwrite
them.
Because
the
learning
is
newer
and
it's
stronger
I
think
that's
a
good
definition
of
catastrophic
forgetting.
A
A
Could
you
call
catastrophic
forgetting
sort
of
orthogonal
to
overfitting,
because
you
could
say
that
every
time
you
apply
the
back
propagation
of
error,
you
are
overfitting
to
whatever
data
signature
you're
getting
or
sent
that
central
tendencies
of
the
data.
You
could
say:
you're,
potentially
overfitting
for
that
and
underfitting
for
whatever
you've
previously
learned
at
that
point,
I
don't
know.
Underfitting
is
a
term
but
I
think.
A
A
The
new
set
is
like
fitting,
well
you're
created
you're,
creating
the
new
curves
to
that
data
right
every
time
you
apply
a
back
propagation
of
error,
algorithm
across
a
machine-learning
network,
you're
you're
trying
to
mitigate
or
you're
trying
to
get
the
best
performance
in
your
loss
function,
so
you're
trying
to
classify
the
best.
Basically,
it's
because
it's
almost
always
some
type
of
classification
task
and.
A
For
each
one
of
those
loss
functions
which
is
so
each
trying
to
think
of
trying
to
think
in
like
Bayesian
terms
here,
I'm
not
very
good
at
this
I'm,
not
very
good
at
the
math
behind
this.
So
forgive
me,
but
if
I
stick
to
the
more
high
level
topics,
yeah
I
guess
this.
So
that's
a
good
way
to
put
it,
though,
mark
I
think
of
it
as
learning
a
skeleton
of
the
data.
A
A
That's
the
whole
idea
of
catastrophic,
forgetting
I'm,
still
just
sort
of
defining
catastrophic,
and
the
reason
why
it's
not
a
big
deal
for
biological
systems
is
because
we're
continuously
learning
one
of
one
of
the
reasons
we're
continuously
learning.
So
there's
not
like
every
day
we
wake
up
or
every
night
when
you
go
to
sleep.
A
back,
propagate
propagation
of
error.
Algorithm
applies
through
your
brain
and
like
there's
an
interesting
process.
I
think
that
happens
when
you're
asleep
for
sure.
But
I,
don't
think
it's
anything
like
that.
B
A
Were
you're
learning
right?
It's
it's
in
a
sequential
fashion,
in
the
same
way
that
you
know
I
pointed
out
at
the
beginning
of
this
paper
we
said
the
they
said.
The
ability
to
learn
tasks
in
a
sequential
fashion
is
crucial
to
the
development
of
artificial
intelligence.
That's
the
type
of
sequential
learning.
A
We
do
every
day
how
we
learn
tasks
in
a
sequential
fashion,
so
it's
like
it's
not
necessarily
applicable
to
the
type
of
intelligence
that
we
are
trying
to
model
with
biologically
inspired
intelligence
mark
says
it's
equally
important
to
define
what
it
is
that
deep
learning
does
learning
a
new.
A
new
shape
is
at
cross-purposes
of
what
you
learned
in
the
old
data.
A
It
seems
if
I
were
doing
a
lot
of
machine
learning.
Stuff
I
would
have
a
tendency
to
assume
that
whatever
data
I
trained
on
I
was
going
to
get
that
basic
type
of
data
in
there
in
the
real
world
in
the
production,
world
and
I
know
from
experience
after
like
living
in
the
real
world
of
data
and
big
data
streaming
data
that
data
changes
over
time.
It
changes
over
time
and
there's
nothing.
You
can
do
about
it.
It's
and
it's
not
the
data
types
or
the
stream
definitions.
A
Necessarily
it's
the
characteristics
of
the
data
and
everything
changes.
Everything
changes
over
time,
so
you're
not
going
to
have
a
general
intelligence
algorithm
that
can't
handle
changing
over
time.
The
world
reality
changing
over
time
you
have
to
with
it.
The
intelligent
system
has
to
change
with
the
world
as
it
evolves,
because
it
will
continue
to
learning,
has
to
be
completely
orthogonal
or
it
will
interfere,
and
it's
almost
never
so
that's
yeah,
that's
a
good
way
to
put
it
yeah
and-
and
you
should
be
able
to
we
do
this-
all
the
time,
we're
doing
a
turn.
B
A
A
A
B
A
That's
that's
a
very
Matthew
way
to
put
it
yeah
exactly
and
the
h
of
HTM
will
add
context,
yeah
or
or
as
well
as
lateral
connections,
the
thousand
brains
idea.
That
also
adds
context.
You
don't
have
to
have
h,
I,
don't
think
I
think
you
can
do
a
lot
without
the
H.
Let
me
put
it
that
way.
You
could
do
a
lot
without
hierarchy
with
I
think
the
lateral
connections
you
there's.
A
A
The
hierarchy
would
be
the
output
of
a
cortical
column
like
the
the
drum
feed-forward
output
right
and,
and
that
and
the
same
thing
in
column
getting
feed-forward
input.
It
would
be
getting
feed-forward
input,
not
from
sensor
data
but
from
somewhere
else
in
the
cortex.
That's
that's
the
hierarchy,
part,
the
lateral
stuff
all
happens
within
one
layer
wherever
that
cortical
hub
happens
to
be
even
if
it's
I
don't
want
to
say.
A
A
B
A
B
A
A
If
you
want
to
voice
chat
with
me
and
anyone
else
who
joins
in
we're
talking
about
catastrophic,
forgetting
and
I've,
been
talking
about
it
for
about
a
half
an
hour
and
looking
for
any
does
anyone
have
any
direct
experience
with
with
it
at
all
in
the
deep
learning
systems
that
they
have
created
just
curious
because
it's
like
I
said
it's
not
something.
We
worry
too
much
about
in
the
biological
area
and
HTM
that
doesn't
it
doesn't
affect
HTM
systems
because
of
their
natural,
sequential
nature
of
processing,
input.
B
A
Okay,
well,
nice
chat
everybody
I'm
going
to
close
the
show
I.
Don't
not
just
gonna
drag
it
out
for
an
hour
if
the
topic
is
over,
the
topic
is
over
so
I
appreciate
you
guys
hanging
out
with
me
for
a
while
talking
about
artificial
intelligence,
specifically
deep
learning
and
catastrophic,
forgetting
and
weather.
Catastrophic,
forgetting
applies
to
biologically
inspired
intelligent
systems.
I,
don't
think
it
does
so
take
care
and
have
a
wonderful
Monday.
I
might
be
streaming
tomorrow.
I'm
writing
a
blog
post
about
my
experience
with
twitch
for
an
event.