►
From YouTube: Weight Agnostic Neural Networks (Numenta Journal Club)
Description
Numenta Journal Club - Aug 16, 2019
Weight Agnostic Neural Networks:
https://arxiv.org/abs/1906.04358
The is a fairly new paper (Jun 11) that reinforces the implication of the Lottery Ticket Hypothesis: that the weights of a network don’t matter as much as people think and there’s a lot of importance in the structure of a network.
Discussion at https://discourse.numenta.org/t/weight-agnostic-neural-networks/6467
B
C
C
E
C
And,
and
so
they'll
actually
run
these
those
sort
of
like
from
their
experience
around
these
agents,
with
the
snare
architecture
and
with
sort
of
just
a
varying
range
of
like
weights
and
show
that
it'll
actually
work
like,
irrespective
of
the
weight
that's
actually
given
to
the
network.
Does
that
make
yeah.
E
C
E
Some
sense,
I,
actually
I'm,
sorry
I
didn't
make
the
same
argument,
but
if
I
think
about
our
previous
work
in
brain
modeling,
this
would
not
be
surprising
them.
You
saw
that
if
you
are
relying
on
the
distributive
representation,
then
it's
the
student
reputation
which
carries
the
meaning
or
not
not
the
weights.
B
B
E
B
C
B
C
But
yeah,
so
they
kind
of
go
under
with
his
motivation
that
there's
like
something
inherent
like
in
the
structure
that
can
perform
these
tasks,
so
they
do
this
in
reinforcement
learning
setting.
So
they
have
like
three
three
examples:
when
is
the
this
bipedal
walker?
They
have
this,
so
basically,
this
walker
just
trying
to
go
as
far
as
it
can
as
it's
like
a
fishing
place
possible.
You.
C
C
There's
a
car
race,
we're
just
trying
to
like
stay
on
track
as
long
as
possible,
and
so
I
think
these
are
like
fairly
like
a
standard
like
RL
of
games,
that
people
try
to
train
on
and
certain
kind
of
comparative
performance,
so
I've,
actually
showing
that,
like
with
these
weight,
agnostic
neural
networks
that
they
can
actually
do
further
role
in
these
games
and
almost
like
match
state-of-the-art
like
once,
they
like
tune
this
like
one
weight,
parameter.
It's
pretty
surprising.
C
D
C
E
That
you
can
just
as
well
say
I'm,
leaving
all
the
same
and
I'm
changing
the
nominating
function.
Yeah
before
I
was
changing
her
dendritic
threshold.
It
makes
a
difference.
Obviously,
if
we
give
a
one
value
every
synapse
and
then
the
dendritic
threshold
is
going
to
be
different
than
if
I
give
up
1/5
of
our
gems.
B
C
Know
exactly
getting
that
but
yeah
it
shouldn't
matter
the
weight
you're,
absolutely
right
about
that.
They
do
show
that,
but
regardless
check,
we
can't
do
that.
One
way
to
optimize
for
the
one
test
yeah,
but
they
don't
really
care
about
that.
They
basically
just
like
want
to
shoot
it
with
the
even
without
like
even
just
any
random
weights.
It's
still
like
perform
very
well
be.
B
B
C
C
Yes,
these
are
just
chosen
randomly,
so
this
is
just
sort
of
like
this
is
an
evolutionary
algorithm.
So
this
is
just
a
sort
of
like
initialize
like
bare
bone
minimum
population.
Just
to
say
like
here,
are
outputs
of
error
inputs,
and
maybe
here
like
a
few
connections
between
this,
and
so
then
motors
are
take
these
networks
and
then
they'll
initialize
the
weights.
Those
sort
of
like
use
the
same
like
six
set
of
points
throughout
the
entire
or
they're
sort
of
like
they'll
test
those
networks.
C
C
C
E
C
Exactly
so
then,
but
do
that
for
every
single
you
know
sort
of
possible
network
in
the
in
the
initial
population.
The
rink
is
performance
sort
of
by
like
it's,
the
max
performance
of
one
of
these
networks,
the
average
performance
of
all
the
networks,
and
then
they
also
rank
its
complexity
or
just
sort
of
like
that
which
I
think
they
kind
of
then
describe
exactly
how
they
do
this.
But
it's
sort
of
based
off
of
how
many
connections
that
it's
sort
of
like
how
sparse
is
essentially
so.
C
Yeah,
so
the
idea
being
here
that
the
evolutionary
algorithm
will
also
try
to
select
for,
like
sparse
networks,
not
just
sort
of
like
make
just
very
complicated
structures
and
and
whatnot
so
and
then
after
that
goes
sort
of
like
take
their
I,
guess
the
best
few
and
then
those
sort
of
bury
those.
So
they
can
vary
these
networks
and
essentially
like
one
of
three
ways
which
I
think
there
may
be
a
better
diagram
with
how
they
bury
it.
C
C
E
E
E
E
B
E
D
E
E
E
B
E
C
A
good
question
actually
I,
don't
think
they
explained
that
that's
it.
I
was
using
the
word
structure.
They
were
using
the
word
architecture.
I
know
that
it
makes
a
difference
like
I
I
think
they
actually
do
you.
This
is
like
sort
of
in
this.
In
the
vein
of
changing
the
architecture,
it's
I'm
not
actually
changing
the
language,
oh
wait
agnostic
and
they
try
to.
D
E
E
A
A
E
F
B
E
D
E
B
D
E
D
Activation
that
has
two
or
maybe
three
parameters,
and
then
you
can
explore
that
space
coherently.
The
knees
you
know
are
just
saying:
I
got
grabbed
back.
These
had
certain
functions
say
on
the
synthetic
some
police.
That's
gonna
require
some
advice,
but
if
you're,
if
you're
looking
to
see
how
critical
it
is
to
have
those
actual
degrees
of.
E
E
E
E
E
F
I
think
this
is
this
can
be
sold.
This
can
be
pitched
as
a
significant
result
or
other
they're
very
close
to
significant
result
and-
and
it
kind
of
goes
with
what
Kevin
alluded
to
how
how
this
activation
function
can
kind
of
stand
in
for
different
dendritic
integration
zones.
You
can
replace
weights
with
some
more
complex
neuron
that
has
like
where,
if
it's
enough
secures
on
one
part
of
the
neuron,
is
this
activation
function?
If
it's
on
another,
it's
like
this
other
activation
function.
E
Wanted
to
explore
that
in
terms
have
in
neurons
and
I
would
build
these
units
to
represent
the
different
types
of
neurons
dendritic
things
that
explore
that,
as
opposed
to
having
eight
different
12
different
types
of
activation
functions.
So
we
can
agree
that
we
said
markers
can
also
decide.
It's
not
the
issue
if
I
want
to
go,
build
an
efficient
system.
C
That's
not
anything
that
this
is
like
a
super
kind
of
pointing
out
like
the
mo
community.
Doesn't
necessarily
sort
of
you
like
the
connectivity
is
like
an
important
thing
or
like
the
back
of
a
lot
of
the
information
is
permanently
I.
Think
just
with
respect
to
that,
maybe
not
with
respect
to
individuals,
already
sort
of
knew
that
yeah.
B
D
D
Think
and
just
structure
and
now
activations
I
think
has
really
just
like
focused
on
what
we're
trying
to
do
here.
But
the
community
is.
This
is
a
tentative
way
of
generating
networks,
and
you
set
this
as
an
evolutionary
goal
and
then
you
see,
okay
are
things
naturally
evolving
toward
sparsity
or
the
natural
all-important
small
world,
naturally
evolving
back
toward
free
scale.
The
slices
of
exploration
space
that
you
could
send
that
as
a
problem.
E
What
we
know
is
true
and
trying
to
figure
out
how
to
inject
that
into
machine
learning,
so
the
extended
we
look
at
something
like
this
and
not
not
default
the
artists,
research,
but
to
say
I,
know
useful.
Is
it
for
our
goal
I
find
that
this
changing
evacuation
function
makes
this
paper,
and
this
comes
all
it's
less
useful
for
our
goal
and
we're
on
worse
Posca
that
live
science,
we're
not
just
supposed.
Let's
explore
the
space
of
possible
solutions.
What
we're
doing
looking
up
the.
D
B
C
B
D
F
D
B
E
B
E
C
E
B
C
All
I
mean
this
is
another
main
keyword.
This
is
just
a
story
just
say
that
there
things
are
are
working
without
like
much
training.
Let's
say
you're
randomly
turn
your
Piggly
Wiggly
okay,
so
they
have
a
a
if
they
just
sort
of
randomly
assign
a
shared
weight
and
they're
sort
of
learned,
evolutionary
algorithm.
C
C
C
Exactly
and
I'm
giving
that
structure
just
all
the
same
way,
even
though
his
train
of
creating
descent
but
then
just
resetting
all
the
way
to
the
exact
of
its
name,
I
mean
comparing
that
I'm
alone
I've
missed,
though
liz
lange's,
they
take
a
trained
neural
network
using
gradient
descent.
Okay,
I'm
just
talking
in
this
column.
E
C
B
B
B
E
C
D
F
C
Maybe
kind
of
like
made
this
and
out
like
analogy
with,
like
like
convolutional
layers,
the
convolutional
layers
they
said,
have
this
court
like
inductive
bias
that
sort
of
like
makes
it
very
like
a
good,
for
you
know,
machine
vision
like
task,
and
so
they're,
basically
saying
is
that
you
know
just
inherently
given
the
way
how
convolutional
neural
networks
work.
They
end
up
like
helping
performance
like
quite
a
lot
and
they're.
C
E
E
C
Really
actually
intend
on
having
you
like,
learn
entire
things
like
with
this
I
think
I
mean
I,
think
they
had
some
ideas
for
it,
but
I
think
like
for
one.
They
said,
maybe
explore
other
things
outside
of
gradient
descent,
but
they
also
thought
of
these
things
as
just
sort
of
like.
Is
this
thing
as
possible
like
building
blocks
I.
E
E
B
C
C
I
found
it
somewhat
difficult
to
interpret,
but
pro-choice
is
one
of
the
things
that
they
send
in
the
example.
Is
that,
like
early
on
you
sort
of
like
have
this
velocity,
you
have
the
position
of
the
car
and
the
car
is
supposed
to
stay
in
the
middle,
so
sort
of
it's
soon
with
our
surge
suggesting
here
is
that
the
carts
off
to
the
right.
E
E
E
C
I'm
just
going
to
mention
the
sort
of
simple
example
where
they
basically
say
like
if
you're
sort
of
off
to
the
right,
there
should
be
an
inverse
relationship
with
respect
to
your
velocity.
Isn't
like
you
go
too
far,
and
they
you
should
kind
of
come
back
in
so
they
started
saying
like
within
the
early
generations.
It's
sort
of
like
learns.
These
connections
between
you
know
your
position,
velocity
and
then
like
in
the
later
generations.
It's
sort
of
like
learning,
more
complex
relationships
between
oh.
B
C
B
E
They're
not
enforcing
any
kind
of
sparsity
on
this
right,
I
mean
they
start
off
bars,
but
once
they
add
connections
remove
its
random
light.
What
I
mean
we'd
be
interesting
to
say
what
do
you
have
a
network
that
fixed
varsity
connections,
and
so
you
can
only
change
edges
of
them
along
with
that
move
on
and
which
is
closer
to
what
we
do
and.
E
E
Know
how
could
you
know
what
could
you
learn
in
the
network,
but
because
of
you,
we're
trying
to
get
closer
to
over
we're
trying
to
get
you
because
yeah
I,
don't
know
you
might
not
even
like
EMC
I
am
finally
tripping.
This
picture
right.
I
see
this
one
in
the
right
there,
calcium,
yellow
wine.
It's
got
all
these
lives
coming
in
out
of
it,
and
some
of
them
only
have
one
so
I'm
interpreting
that
maybe
incorrectly
that
there's
so
imbalance
and
the
Brooklyn
foods
and
outlets
that
workers.
E
B
B
E
B
C
A
C
E
B
C
E
E
E
E
Right
but
I
think
that's
why
it
would
be
interesting
to
run
the
experiment.
It's
you
know
what
you
end
up
here
or
the
end
of
my
solution.
Everything
as
we
know
it.
It's
different
number
of
connections
and
every
different
activation
function,
and
and
say
you
know,
there's
almost
like
handcrafted
solution.
E
The
solution
where
you
force
yourself
down
a
single
activation
function
and
a
certain
time
sparsity.
Then
your
search,
it's
pulling
a
different
part
of
the
space
of
solutions,
you're
restricting
yourself
to
okay,
given
as
far
as
coming
to
be
networks
of
a
certain
sparsity.
What
can
be
learned
at
what
can't
be
learned
again,
I'm
thinking,
ultimately
in
the
future
of
AI.
We
believe
that
these
things
are
going
to
be
sparse
and
they're
gonna
be
relatively
fixed
varsity
and
it's
more
than
three
microcosm
our
purposes
more
useful.
D
You
can
make
in
a
certain
area
for
a
variety
of
reasons,
but
it's
still
very
robust.
On
the
other
hand,
on
a
larger
time
scale,
where
you
have
things
and
it's
cerebellum
cortex
very
little
places
we
have
all
these
connections
and
you're
kind
of
over
time
pruning
them
away.
You
have
this
rich
network,
but
by
various
means,
you're
kind
of
removing
the
elements
from
that.
So
I
think
there's.
Both
mechanisms
can
be
operating
that
you're
working
against
they.
They
constraint,
which
is
kind
of
what
you
were
describing,
is
saying.
C
The
eminence
yeah,
so
this
is
Emmett
results.the
idea,
so
just
a
table
8
they
I
mean
they
were
talking
like
Anna
reinforcement
learning,
but
they
sort
of
also
wanted
just
to
show
that
it's
actually
like
a
reinforcement.
Learning
setting
that
there's
a
small
number
of
its
influence
on
small
number
of
outputs,
but
they
also
wanted
to
show
it
could
like
work
in
like
a
supervised
setting
such
as
deafness.
That
has
like
a
relatively
larger
number
of
inputs-
and
you
know
some
some
high
number
of
outputs
or
at.
C
B
C
B
C
Like
this,
like
one
network
and
they
sort
of
like
look
at
the
classification
accuracy
for
just
different
weights,
they
noticed
I
know
this
is
evidence
again.
So
this
is
like
the
digit
and
then
this
is
the
like
the
weight
value
so
like
the
shared
weight
value
after
this
after
the
training
has
been
after
the
training
has
been
done,
and
then
you
know
the
yellow
is
a
percentage.
So
they
know.
Is
that
not
that
the
sort
of
like
accuracy
is,
you
know
different
for
different
way.
C
E
C
E
E
E
C
E
C
That's
true,
if
any
of
them
who's
trying
to
show
like
this
sort
of
application
to
the
supervised
learning
site,
and
they
take
that
one
network
training
with
the
evolutionary
algorithm
and
just
sort
of
train.
You
know
all
the
way,
so
it's
a
sort
of
like
a
minimal
network
and
then
train
all
the
links.
Then
they
get
this.
Ninety
four
point:
two
percent
accuracy.
The
only
interesting
point
about
this
is
that
they
point
out
that
the
paper,
what
does
deconstruct
from
the
water
you
take
hypothesis,
which
I
haven't
read
that
but.
B
C
B
C
C
B
C
B
C
D
D
B
Yeah
I
think
the
dynamic
sparsity
stuff.
You
know
we're
trying
to
learn
those
connections-
Bennett
yeah
as
a
part
of
the
for
part
of
the
learning
similar
with
tempo
memory.
You
can
think
of
tempo
memory
as
being
a
weighted
Gnostic
structure.
That
is
all
about
the
structure
of
the
same
activation
function
everywhere
and
bindery
weights.
We've
learned
the
structure
as
as
part
of
mark.
C
Yeah
I
guess
that's
about
it.
Personally.
I
just
had
the
motivation
to
go
through
this
paper,
because
I
mean
for
me
that
has
it
wasn't
immediate,
obviously
coming
to
momentum
that
there
was
like
a
lot
of
importance
in
the
structure
so
like
this,
in
combination
with
the
lottery
ticket
hypothesis
year,
I
spot.
E
C
E
C
I
was
only
differentiate
between
like,
like
surprisingly,
just
sort
of
like
kind
of
no
I
guess
I
shouldn't
say
this
really
surprising,
which
is
sort
of
like
actually
sort
of
getting
the
sense
that,
like
feeling
it's
actually
looking,
it's
sort
of
like
a
matter
of
fact,
as
opposed
to
like
oh
yeah,
this
is
like
possible.
So.
D
B
C
In
combination
this
with
the
lottery
ticket
paper,
I
think
it's
just
sort
of
I
live
the
very
least
reinforces
that
kind
of
stuff.
That
would
say
I
working
on
the
dynamic
sparsity.
Obviously
so,
at
least
for
me,
for
my
perspective,
sort
of
like
understanding
this,
this
is
kind
of
nice
to
help
me
understand
sort
of
things
here
in
a
month.
In
a
broader
sense,
it's
gonna.
E
E
D
Could
come
away
with
some
curiosity
about
is
that
if
there's
a
formalism
allows
you
to
abstract
certain,
if
you,
the
learning
whatever
this
percentage
in
this
particular
topology,
is
due
to
the
weights,
this
percent
percentages
do
just
the
connections.
If
there's
a
way
of
actually
saying
you
know,
I'm
ganna,
do
it
all
with
one
more
or
less
all
with
the
other,
but
is?
Is
their
way
of
manipulating
these
things
as
to
abstractions
toward.
F
E
Or
two
new
years
of
precision
they
are
like
binary
and
the
only
thing
you
can
do
is
you
can
turn
them
on
turn
them
off
grow
them
get
rid
of.
So
this
is
an
observation
many
years
ago
that
it's
fundamentally
inconsistent
with
the
idea
of
weights
in
the
minimal.
It's
a
very,
very
small
amount
of
tuning.
You
can
make
light
minor,
so
one.
E
Way
to
build
in
networks
make
the
best
way
to
building
networks
is
to
go
with
weights
and
for
synapses,
and
that
may
be
true.
But
from
my
personal
philosophy,
point
of
view
is
I
found
that
if
you
start
exploring
the
space
of
possible
solutions
for
intelligence,
you
just
run
into
morass
and
never
make
progress
way.
If
you
could
constrain
yourselves
to
the
biology
and
just
accept
that's
the
fact,
then
you
won't
get
lost
along
the
way.
You
won't
say
you
won't
go
down
a
rat
hole
where
you're
exploring
different
activation
function.
E
You
used
to
say
I
know,
there's
a
solution
in
the
brain.
The
brain
is
constrained
by
this.
Let
us
completely
understand
how
the
brain
works
first
and
then
we
could
ask
ourselves
after
we
understand
that
we
could
say:
oh
well,
could
I
improve
upon
that.
Can
I
improve
upon
that?
What
nature's
discovered,
how
to
build
intelligent
brains
by
adding
something
else,
but
in
the
meantime
you
do
that.