Numenta Live Streams, 16 Aug 2019

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Weight Agnostic Neural Networks (Numenta Journal Club)

Description

Numenta Journal Club - Aug 16, 2019

Weight Agnostic Neural Networks:

https://arxiv.org/abs/1906.04358

The is a fairly new paper (Jun 11) that reinforces the implication of the Lottery Ticket Hypothesis: that the weights of a network don’t matter as much as people think and there’s a lot of importance in the structure of a network.

Discussion at https://discourse.numenta.org/t/weight-agnostic-neural-networks/6467

A

Hey, are you guys ready ready to go.

B

We're about to go on the arable, Scott, Walker story.

A

All right we're on the air.

B

Actually, really cool yeah.

C

B

C

Okay, so just going okay, you're, the chairman, yeah.

D

We're just sure.

C

So, okay, so wait agnostic, neural networks. It's called wait agnostic because you have one shared parameter in the entire network and the idea here being to insert neural networks, have this one check parameter that are also minimal and they're like connectivity. So what's the hey? You.

E

See minimal, in fact, is that means are sparse. Yeah.

C

Sorry I didn't other word for sparse. We.

E

Have sports networks, where you really don't train or tune or anything yeah.

B

Everything is the same way. Every connection is the same way. Yeah.

E

And the waist bar initialize, but not trained.

C

Though the weights are not learned, will you be is actually try to learn a neural network. That sort of by its structure alone can actually perform a test. So.

E

That means the learning is the structure the.

C

E

Our neural systems, we just assume they're, binary weights and all training.

E

C

And, and so they'll actually run these uh those sort of like from their experience around these agents, with the snare architecture and with sort of just a varying range of like weights and show that it'll actually work like, irrespective of the weight that's actually given to the network. Does that make um yeah.

E

C

E

Some sense, I, actually I'm, sorry I didn't make the same argument, but if I think about our previous work in brain modeling, this would not be surprising them. You saw that if you are relying on the distributive representation, then it's the student reputation which carries the meaning or not not the weights.

C

Yeah, that's actually, but.

B

B

E

You see where they're coming from that's true, but once you've accepted that you have sparse connectivity and sparse representations. Yeah.

B

E

Say don't think that way, but then then it follows naturally yeah.

C

Yeah, they actually gave this sort of biological like motivation with respect to animals, that sort of know how to do things. That earth serves I, guess and they're, potentially there's a what they called like and strong inductive biases and like the connections, but.

E

That's something that would be not learned right exactly.

C

No, that's like the point there's like sort of like connections and yeah.

B

Maybe there's two independent issues here: one is the the values of the weights they're all identical here and the other is the structure and unarmed. The structure.

D

Structure yeah.

C

Exactly directed by in this case, the the structure has not learned at birth, but sort of like learned by like evolutions before.

E

Yeah yeah yeah yeah.

C

But yeah, so they kind of go under with his motivation that there's like something inherent like in the structure that can perform these tasks, so they do this in reinforcement learning setting. So they have like three three examples: when is the this bipedal walker? They have this, so basically, this walker just trying to go as far as it can as it's like a fishing place possible. You.

A

Can see that my supply yeah.

C

So this will be done with the revolutionary I'm just sort of explained with the problem. They actually did it on and so I kind of get below or get to like how they actually arrived at these networks and up a little bit but just kind of explain. The Carl's are Cleveland.

C

There's a car race, we're just trying to like stay on track as long as possible, and so I think these are like fairly like a standard like RL of games, that people try to train on and certain kind of comparative performance, so I've, actually showing that, like with these weight, agnostic neural networks that they can actually do further role in these games and almost like match state-of-the-art like once, they like tune this like one weight, parameter. It's pretty surprising.

C

E

Is explaining one that one parameters, what.

C

The one wait for him--it is well, the one weight parameter is just like I mean imagine just like a sparse neural network, and so the.

D

C

E

That you can just as well say I'm, leaving all the same and I'm changing the nominating function. Yeah before I was changing her dendritic threshold. It makes a difference. Obviously, if we give a one value every synapse and then the dendritic threshold is going to be different than if I give up 1/5 of our gems.

B

E

But I would view that less about a synaptic comparator I would say that's more of a neuron.

C

Know exactly getting that but yeah it shouldn't matter the weight you're, absolutely right about that. They do show that, but regardless check, we can't do that. One way to optimize for the one test yeah, but they don't really care about that. They basically just like want to shoot it with the even without like even just any random weights. It's still like perform very well be.

B

C

Yeah, they are sorry I can.

B

C

I'm, like they actually creating the next thing anyway, so scroll down a little bit they've like a nice sort of diagram that illustrates how they actually come.

B

C

Okay, so they start off with just a very like that, the bare bones of the network that they actually need. So, however,.

F

C

That are given in the and the RL tasks in the number of outputs it actually to give, and then just very, very sparse connections between those things and even like leaving out some like some of the inputs.

C

Yes, these are just chosen randomly, so this is just sort of like this is an evolutionary algorithm. So this is just a sort of like initialize like bare bone minimum population. Just to say like here, are outputs of error inputs, and maybe here like a few connections between this, and so then motors are take these networks and then they'll initialize the weights. Those sort of like use the same like six set of points throughout the entire or they're sort of like they'll test those networks.

C

What W is equal to, plus two and then see the performance of these is equal to plus one test. The performance they'll do that for the same, the same number of ways for the same like for each network in the and the population, so.

E

They're, actually using the weight value itself as part of the parameter.

C

They're not trying to use the weight as a parameter. The reason why they try to do it in this way, using like all.

E

C

401 never be idea being here is: let's try like a wide range of weights and then take the average performance and then make sure that this network can perform well on a wide range of ways. Oh.

E

C

Exactly so then, but do that for every single you know sort of possible network in the in the initial population. The rink is performance sort of by like it's, the max performance of one of these networks, the average performance of all the networks, and then they also rank its complexity or just sort of like that which I think they kind of then describe exactly how they do this. But it's sort of based off of how many connections that it's sort of like how sparse is essentially so.

B

Lower is better yeah. Lower connectivity is better exactly.

C

Yeah, so the idea being here that the evolutionary algorithm will also try to select for, like sparse networks, not just sort of like make just very complicated structures and and whatnot so and then after that goes sort of like take their I, guess the best few and then those sort of bury those. So they can vary these networks and essentially like one of three ways which I think there may be a better diagram with how they bury it.

C

Okay, yeah, so here are the three in the middle, so they can either insert a node between an existing connection. They can either add in a connection between two existing notes or they can change the activation of a you know of a current existing activation and.

D

So they actually have quite a few activations.

C

Of ALS here, the linear step sine cosine guy seen and it's single.

E

That apply to all no.

C

They would just change the one, so these are like just very these are the changes that they make to the two. Winning networks are very, very small, so.

E

By changing the activation now longer so then different units in the network of different activation.

C

Yeah, exactly yeah I mean tell you not and.

E

Now, there's no longer purely that's a very unique activation function, vendor training that every.

C

Unit of a different activation function.

E

C

B

You say this is not global to the whole network. Now.

C

Let's not go, but these are very like local.

E

E

B

E

That that then becomes a parameter of the network. Yes,.

C

B

Yeah I mean some of them may not even have activations.

E

Basis the activation function of different neurons yeah- that is an odd way of training, Network.

E

I mean that's a much more structured thing and I'm just saying you're right. It is, but that's still not.

B

E

D

E

And there are different types of paralympic, but in this kind of network here, I always make some primal. Urns may behave differently if they're doing.

E

This is your classic neural network type of analysis that you choose, which.

D

Tree those yeah.

E

But if you did that, then you, if you want to introduce that little realism, then you say: ok.

E

And digital synapses that doesn't cause itself a fire and then I introducing that complexity or they're just a.

F

This could be like a stand-in for that complexity.

E

He's different just described, burning, yeah.

E

E

You and Shannon sitting, because we said first of all, these two are busy. There's really only one parameter, which.

C

Both it's the structure, matched with the one wait I mean you have you if.

E

I thinking about a parameter is something I changed to the properties of network yeah.

B

E

I'm changing defining the basis, I'm changing the activation function. You.

C

Don't think that is like changing the structure, no.

E

Learning by changing the graph and.

B

Apology that these things in some order.

E

First well conflicted I'm, which is really just changing the network structure and there's only one parameter. They shown that one parameter Omega difference. This is not making the network.

B

Behavior, this is critical. That's.

C

A good question actually I, don't think they explained that that's it. I was using the word structure. They were using the word architecture. I know that it makes a difference like I I think they actually do you. This is like sort of in this. In the vein of changing the architecture, it's I'm not actually changing the language, oh wait agnostic and they try to.

D

Say let's to this network so that no matter what the weights are, you still get the results that you want, yeah. Technically, that's what they're achieving you know.

D

Apology and functional.

E

I would love to know what kind of is also they get with that. You know there's that sufficient, where you're just changing to these other things, but you're, especially keen topology of the network, and that's the only thing you're changing.

C

E

That's what I get a lot you're gone way by introducing this sure this is gonna, make a difference, ensure the evolutionary algorithms gonna figure out different values. It work better, but it kind of lobbies the whole picture. It makes it hard to interpret hey.

A

E

Go online you're having an extra stuff. You know, yeah.

A

That's a good point. You.

E

Guys hear me yeah.

B

Man, hey Jeff,.

A

Somebody actually says something interesting in chat. You could think of the different activation functions as selecting as being different types of cells that make.

F

Different parts of the cell.

E

I'm just pointing out that is, this is a huge change from saying we're, just checking the topology and we're saying how we can do it's a topology, and now it just based. So if this is very hard to interpret the results, it's gonna.

B

Be very different results if this is part of every experiment, if they, if they get controlled experiments, but they didn't change the accident, so I think as I suggest.

B

And the that you can say: okay, I, can.

F

B

Know the universe of philosophy, and then it's very nice money I might be the water somehow so they're trying to. Although this is all methods, they're, not thinking about yeah I, think this is something to give feedback to the author. It would be to really test it in the impact of this. It really.

B

Is not only intensities and thank you maximum on that.

B

Yes, a mission.

C

D

Muley, all of these union now.

B

You could also keep the destructor beating dance and juno listenings.

E

Activation functions to.

C

The hours at a continuance.

E

D

B

For each one, do they also change the parameters like the sigmoid billeting, sir I didn't.

C

Check that I resumed, probably not.

E

Classic function, we can get great results just by tuning the connectivity of the graph. Yes,.

B

E

Would be the more powerful result, and then you can ask yourself well how much doesn't make a difference if I allowed by using the difference, function.

E

Right now we're basically creating a new type of network which I don't think anyone really understands the thought about before, which is fine. But it's really.

E

If you just changed.

B

Yeah, maybe we should.

E

It's all based on that with a more complex architecture different, but in the end, all.

B

Our networks learn by that method, does one day, maybe.

B

D

C

Crooning stuff.

F

Yeah I think eventually I think eventually we'll want to go.

B

That spice to this issue, yeah yeah.

E

I mean even if you were named Clemente, this is still a gun or something like that. These changing activation functions as a complexity that you really would not want to have if you could avoid it. So the question is: how much are their results depended on the change the activation function.

E

But it is a critical question, the probably claps all those into an.

D

Activation that has two or maybe three parameters, and then you can explore that space coherently. The knees you know are just saying: I got grabbed back. These had certain functions say on the synthetic some police. That's gonna require some advice, but if you're, if you're looking to see how critical it is to have those actual degrees of.

E

Freedoms, then, if you have a function or a representation of these activations and then say: okay, let's clamp down. You know some of the degrees of freedoms he can so get yeah future this stuff. We really really would like.

E

You really really want to get down to the point. The only thing you're changing the.

F

End the wager binary.

E

That is, that is the, although those.

B

Are the goals in.

E

Terms of implementing sufficiently I.

B

Think it changed the activation.

E

Does I totally agree with that and.

B

Of course, that goes.

E

Along the way strike the really the way, some really all the same, really only matter soldiers in activation function. So you can say, let's pick a firing, wait and pick the right activation function for Chennai, interesting.

E

You know by using these operations that allows you to let something run you know for a while. They come up with here's know. What's that work, for whatever reasons against this.

E

I, don't think the goal here is solve this problem. The point in my mind, of their orders for in my timeline doing this study is not what's the best way we can solve. This problem is to say what is the most efficient is. Are there ways of solving.

E

F

I think this is this can be sold. This can be pitched as a significant result or other they're very close to significant result and- and it kind of goes with what Kevin alluded to how how this activation function can kind of stand in for different dendritic integration zones. You can replace weights with some more complex neuron that has like where, if it's enough secures on one part of the neuron, is this activation function? If it's on another, it's like this other activation function.

F

This is like a proof of concept that you can eliminate weights and just bring that in as.

E

It's separating from the biology I.

E

Wanted to explore that in terms have in neurons and I would build these units to represent the different types of neurons dendritic things that explore that, as opposed to having eight different 12 different types of activation functions. So we can agree that we said markers can also decide. It's not the issue if I want to go, build an efficient system.

C

That's not anything that this is like a super kind of pointing out like the mo community. Doesn't necessarily sort of you like the connectivity is like an important thing or like the back of a lot of the information is permanently I. Think just with respect to that, maybe not with respect to individuals, already sort of knew that yeah.

B

I mean the people are in communities all about you set out this dense connectivity in the new backprop, the tube the weights. Very, it's all about how you do weights really monthly yeah. That's like that. You know 99% of the deep learning community, so this is completely.

D

B

Opposite of that, there's no almost done back. Rock I.

D

Think and just structure and now activations I think has really just like focused on what we're trying to do here. But the community is. This is a tentative way of generating networks, and you set this as an evolutionary goal and then you see, okay are things naturally evolving toward sparsity or the natural all-important small world, naturally evolving back toward free scale. The slices of exploration space that you could send that as a problem.

B

And say: yeah these things tend to fall into these basins. I would.

D

B

They will tend towards varsity, because that's.

E

What we know is true and trying to figure out how to inject that into machine learning, so the extended we look at something like this and not not default the artists, research, but to say I, know useful. Is it for our goal I find that this changing evacuation function makes this paper, and this comes all it's less useful for our goal and we're on worse Posca that live science, we're not just supposed. Let's explore the space of possible solutions. What we're doing looking up the.

D

Observation just look at the chattering for creation. You put the constraints on that. You want it's gotta, be binary, it's gotta be just the connections. This is a mechanism for going exploring.

B

They start with a minimal network yeah. How did they decide exactly good certain Oh? What if they're logical or than that I think.

C

It may be like random I think they, like sort of randomly choose like one of the three insert node at connection change, a correction. So it's like the development.

D

Space on blacks.

B

D

C

Randomly do that and.

B

Then randomly assign inoculation.

D

If it's a new new, it's a purely wind and I.

C

Believe they didn't really describe actually their selection process in detail, but they made it seem.

F

C

It was like clearly yeah.

B

It says that they're they're using this well-established evolution algorithm meet yeah.

B

Once you have, you know, publish networks.

D

That you do recombination yeah yeah, to see to try the evolutions.

B

Yes, there's a bunch.

D

D

Like that, where they're doing.

B

You know genetics roughly genetic styles that were there involving.

D

B

Architecture and they do various ways of combining and splitting and a table like that, so this is kind of realized. I. Think me to me.

D

Must be just another one of these things.

B

But Google you can actually subscribe to or you going.

E

Back to the activation functions today, report after they train these networks. What is the district mention of activation functions that were in the final networks? Big-Endian.

B

This is it's more as a nonlinear system. That's the result in urban design. Key which bottle be included, is changed by region on the entry on Kennedy's death with sites like those.

E

I was mourning distress, insistence of.

B

Little hard to see, but they have showing.

B

And I think on the left, so these are all the know.

B

I see a bunch of dark once.

E

I said those are inverse or can yeah.

E

E

I see much of those yellow, you want.

E

Well, wonder argue: some of them never show up. Maybe I don't see a lot of gaussians.

E

Think harder to.

C

B

You can also take you can use the sigmoid approximate a lot of these other things, yeah yeah.

E

B

Know give another already searching the structure. I think you can get rid of this by just having a more complex search, yeah.

B

Okay, so with the results other.

D

Questions: here's.

C

All I mean this is another main keyword. This is just a story just say that there things are are working without like much training. Let's say you're randomly turn your Piggly Wiggly okay, so they have a a if they just sort of randomly assign a shared weight and they're sort of learned, evolutionary algorithm.

C

They and sort of compare that with just like a fixed apology. That also has given like a random shared weight like throughout the entire network. Then they basically just show that you know fairly consistently in these three in these three environments that you know get performed.

E

Well, the random, the RAM color in one so.

C

This is both both are both are previously trained, so we had a fixed one. That's like training like using the gradient descent, and we have the.

E

O-W diagnostic.

C

Neural network- that's trained with the.

E

Able to figure out we're hearing a more traditional training paradigm, yeah.

C

E

The evolutionary paradigm yeah.

C

Exactly and I'm giving that structure just all the same way, even though his train of creating descent but then just resetting all the way to the exact of its name, I mean comparing that I'm alone I've missed, though liz lange's, they take a trained neural network using gradient descent. Okay, I'm just talking in this column.

E

Right, Chinese ingredients of death. They.

C

Started that exactly just I'm talking with this column, they turned the.

E

Green descent- and they don't does not change the conductivity graph. It's just changing the weights of them exactly.

C

Yeah, okay and then they take that and they change all the weights to be exactly the same. So is this: it was trained with actually I guess that doesn't make any sense.

B

And the last one is always with.

B

What I think it's winning Kristen? This is the only issue start a fixed apology and when you get more or less this one fixed.

B

B

So I think if you don't, you have a lot just, but if you do earn, then the.

B

E

I summarize, this is all in different words, so the instructions of evolution.

B

And if, instead, you training to evolution,.

E

I think the important one here is that other.

C

E

C

Not actually trying to show that they're out sort of matches like state-of-the-art. In fact in the discussion they say that, if it did match state-of-the-art, be like quite embarrassing for like state-of-the-art.

D

F

C

Maybe kind of like made this and out like analogy with, like like convolutional layers, the convolutional layers they said, have this court like inductive bias that sort of like makes it very like a good, for you know, machine vision like task, and so they're, basically saying is that you know just inherently given the way how convolutional neural networks work. They end up like helping performance like quite a lot and they're.

F

Basically, trying to show here is that just inherently the way, other network structures.

C

B

Interesting is.

E

When we that's really the spinal cord and part of.

E

We program to walk and so there's really not.

E

So this might be close to what actually happening in the brain, where active, learning.

E

So I think it's very just. We have to keep that in mind, we're going to result yeah. This is probably how the brain learns.

E

But we choose slide that interesting. This is how we're going to learn to build AI systems. You're.

C

Not going to learn.

E

C

Really actually intend on having you like, learn entire things like with this I think I mean I, think they had some ideas for it, but I think like for one. They said, maybe explore other things outside of gradient descent, but they also thought of these things as just sort of like. Is this thing as possible like building blocks I.

E

Understand that I'm just saying we can put our own interpretation on it. We have to I'm trying to I'm trying to put some spin on. So you know I just think, I understand that just we just need to acknowledge that.

E

Is useful and it's very useful in the brain- and this is these- are the interesting results, but just just have to remind ourselves not to get interpretive beyond what it is. Yeah.

C

Okay, anything: how much is actually more I couldn't have covered like a lot of the major points.

B

C

So they try to give just an example of like or just trying to show you. You know how these things evolve over time, and so they have this just showing this for the grant. What this thing is called I mean they show something.

C

The carpol swing up, okay, and so they stoners show the network at the different generations, and then this sort of like show the rewards for the given weight, basically trying to emphasize that you know, as the generations go on, then the reward gets higher and higher for pretty much all the all the weights they can either like play around with this sort of, like you know, a book, but one of the things that they kind of like, but one of things that they mentioned is that, given how small these networks are, they can kind of be interpreted.

C

I found it somewhat difficult to interpret, but pro-choice is one of the things that they send in the example. Is that, like early on you sort of like have this velocity, you have the position of the car and the car is supposed to stay in the middle, so sort of it's soon with our surge suggesting here is that the carts off to the right.

E

C

Fault, you have to balance the pull and stay in the center I.

B

Think it said you can't.

E

E

C

I'm just going to mention the sort of simple example where they basically say like if you're sort of off to the right, there should be an inverse relationship with respect to your velocity. Isn't like you go too far, and they you should kind of come back in so they started saying like within the early generations. It's sort of like learns. These connections between you know your position, velocity and then like in the later generations. It's sort of like learning, more complex relationships between oh.

B

C

Those aren't there pollen count of the arrogance, but they do sort of like clean it, because.

E

It's in closer to balance, it.

C

E

But by the end.

C

F

C

The other fairly reasonable we're gonna go.

B

And in this, in this case, you should be a little away sound a little bit and it should still do pretty well right because they've tuned for that yeah.

C

Exactly yeah, so you can actually change the way and it should yeah there. It goes yeah except.

B

Zero, if you go, if you put it to zero.

C

It's like pretty.

B

E

E

They're not enforcing any kind of sparsity on this right, I mean they start off bars, but once they add connections remove its random light. What I mean we'd be interesting to say what do you have a network that fixed varsity connections, and so you can only change edges of them along with that move on and which is closer to what we do and.

E

C

Just I'm trying to bring this you.

E

Know how could you know what could you learn in the network, but because of you, we're trying to get closer to over we're trying to get you because yeah I, don't know you might not even like EMC I am finally tripping. This picture right. I see this one in the right there, calcium, yellow wine. It's got all these lives coming in out of it, and some of them only have one so I'm interpreting that maybe incorrectly that there's so imbalance and the Brooklyn foods and outlets that workers.

E

You know you don't saying it won't even forces take Sparsit now, if you still have connections of.

B

B

B

E

Are, of course, the sparsity, because they're trying to optimize first of all their networks? How does that enforce person.

B

C

The evolution of our selections.

E

Unit had a lot more connections.

C

Okay, I think oh I guess the last thing is asserted: they applied to Stannis.

C

This Kurtis go down there kind of rosiness. Oh, this is them showing like the car racing example in the car. Actually like perform decently. Well, I think I'll pretty much forget about India.

A

But if you go through these examples ourselves on the.

A

C

E

C

These are here, Rosalie will.

F

Actually, divide.

C

The bipedal, Walker I think I.

E

B

E

About the car pole, which is really just this one-dimensional yeah.

C

Yeah I mean they are like. There are a small number about input, this phone number of outputs- they did a big two is like yeah I mean they are like relatively simple yeah they're.

D

Showing you thing where, once they get a solution network and try lets you know if I remove one.

E

They try to pruning operation after the fact I know it's.

D

A paper this thing it all falls down. Okay, so I leave this one here, I remove this one. It.

E

Was that important, I'll leave it off? Well, they evolution out of them would do that right so on they.

D

Ever actually delete connections, I think.

C

You know I, don't think they.

D

They didn't mention.

C

That they do actually, they just said that they had and.

E

D

So that you can just what do you want to do that dynamically, where you want to say just all in until I get a solution, so that goes back to my suggestion.

E

Would have been interesting that they ran it with a face varsity, and so you, you know, start super smart or some levels watching everyone in Cleveland you have a billion and oxidation you. You would think it wouldn't end up with yeah.

D

But if you do that, your your chances of getting to a solution will probably be a lot slower.

E

Right but I think that's why it would be interesting to run the experiment. It's you know what you end up here or the end of my solution. Everything as we know it. It's different number of connections and every different activation function, and and say you know, there's almost like handcrafted solution.

E

The solution where you force yourself down a single activation function and a certain time sparsity. Then your search, it's pulling a different part of the space of solutions, you're restricting yourself to okay, given as far as coming to be networks of a certain sparsity. What can be learned at what can't be learned again, I'm thinking, ultimately in the future of AI. We believe that these things are going to be sparse and they're gonna be relatively fixed varsity and it's more than three microcosm our purposes more useful.

E

Understand that the limits of that not going until what you're, converting sort of read, yeah I, think there's like.

D

This because at one level, there's only Philippi in the biological model, there's.

B

Only so many connections.

D

You can make in a certain area for a variety of reasons, but it's still very robust. On the other hand, on a larger time scale, where you have things and it's cerebellum cortex very little places we have all these connections and you're kind of over time pruning them away. You have this rich network, but by various means, you're kind of removing the elements from that. So I think there's. Both mechanisms can be operating that you're working against they. They constraint, which is kind of what you were describing, is saying.

D

I can only make so many connections, it's a trade off there's no noise I've got a very grid space, but.

B

Afterwards, do I really eat all this all the time and then you know I think both of those metals could be an operation yeah.

B

C

The eminence yeah, so this is Emmett results.the idea, so just a table 8 they I mean they were talking like Anna reinforcement learning, but they sort of also wanted just to show that it's actually like a reinforcement. Learning setting that there's a small number of its influence on small number of outputs, but they also wanted to show it could like work in like a supervised setting such as deafness. That has like a relatively larger number of inputs- and you know some some high number of outputs or at.

D

The very least, a high number of.

C

Inputs, so they train like they're in a very similar way, the revolutionary algorithm and just from just from the evolutionary room they get like 82 percent accuracy, which is you know much better than chance right and.

C

Ok, so they also so they also trained like, and so they take that same. They take that same network and then notice that it can sort of perform where'd it go, don't they make same network and then notice that can perform yes.

B

C

B

C

Like this, like one network and they sort of like look at the classification accuracy for just different weights, they noticed I know this is evidence again. So this is like the digit and then this is the like the weight value so like the shared weight value after this after the training has been after the training has been done, and then you know the yellow is a percentage. So they know. Is that not that the sort of like accuracy is, you know different for different way.

C

Values for different digits sewing for one weight value that may just be better suited.

B

For one digit for another way, value that may.

C

Be exceeded for another digit, so.

B

And I sort of man isn't it.

C

So they do, is they sort of like me like this ensemble, where they sort of kind of like take the best weight values of every single digit? So then they have an ensemble of ten networks that sort of are better to for like one digit and then they sort of had that ensemble like both.

C

So essentially, you know the the way that the network with the you know, four digit zero will both and you know maybe a little go for, like you know a zero for three sometimes, but it was still like vote for a three o'clock and then it just sort of like you know, whichever digit gets like the highest vote, so I'm trying to like describe how their ensemble ID work, but they essentially use their like.

E

I mean this seems sort of orthogonal to the point of the paper it so basically, it's kind of yeah.

C

It is decently, I go to the paper, it.

D

Just to serve like say like if.

C

They turn this ensemble. Then they get sort of like matching test accuracy if they just like use like linear regression. So that's kind of interesting so basically say like you know, just one weight alone here.

F

That maybe I do it, but.

C

If you take looking on top of all of the weights, then you can start.

E

C

Well, they don't train it for individual digits.

E

C

No, they evolved one and then they pick yeah.

E

C

Weight for each one, so I mean I, mean technically in there you're right. It is like a training with study.

E

Just picking different networks, one three digit and then they're doing.

E

C

That's true, if any of them who's trying to show like this sort of application to the supervised learning site, and they take that one network training with the evolutionary algorithm and just sort of train. You know all the way, so it's a sort of like a minimal network and then train all the links. Then they get this. Ninety four point: two percent accuracy. The only interesting point about this is that they point out that the paper, what does deconstruct from the water you take hypothesis, which I haven't read that but.

B

C

It's gonna be the.

B

C

They thought so and that one apparently fine they sort of like they sort of like learned by pruning I, think some sparse mask that sort of goes over the network and then once as far as man is used, then I guess sort of irrespective of the weight. Then they get around like 94% accuracy. It.

B

C

B

C

So what is this then initial? So I haven't read this paper, so I'm going on I, didn't sort of like look up this one thing to the learn: mask oh I,.

B

See, oh I see, shall we okay.

C

B

C

Then it's finally, it's sort of like interesting that you know kind of taking these different routes of of like trying to like learn, minimal connections. You end up like with roughly the same accuracy.

B

Why do we OH okay, your training day, yeah, exactly our names on.

C

B

B

C

D

D

C

Was a oh they're coming over here with and a table comparisons the.

D

Cable anyway, timers, okay, so yeah, so it says as well as the linear, classifier thousands of weights. So it says you both with those things.

B

So I probably trying to divide.

D

That's there, that's why that it.

B

Yeah I think the dynamic sparsity stuff. You know we're trying to learn those connections- Bennett yeah as a part of the for part of the learning similar with tempo memory. You can think of tempo memory as being a weighted Gnostic structure. That is all about the structure of the same activation function everywhere and bindery weights. We've learned the structure as as part of mark.

C

Yeah I guess that's about it. Personally. I just had the motivation to go through this paper, because I mean for me that has it wasn't immediate, obviously coming to momentum that there was like a lot of importance in the structure so like this, in combination with the lottery ticket hypothesis year, I spot.

E

C

B

Saying it's a bug to me because I think.

E

F

He came in this morning.

C

I was only differentiate between like, like surprisingly, just sort of like kind of no I guess I shouldn't say this really surprising, which is sort of like actually sort of getting the sense that, like feeling it's actually looking, it's sort of like a matter of fact, as opposed to like oh yeah, this is like possible. So.

D

B

C

In combination this with the lottery ticket paper, I think it's just sort of I live the very least reinforces that kind of stuff. That would say I working on the dynamic sparsity. Obviously so, at least for me, for my perspective, sort of like understanding this, this is kind of nice to help me understand sort of things here in a month. In a broader sense, it's gonna.

B

All right, so all this.

E

Yeah, so it's just personally.

E

And yet it is still difficult for other people to come agreement about this, and so so part of mine angst. You today is this.

E

And I'm not expecting criticism, it's just an observation.

E

C

This can be a good topic, for it is still article I, think I.

D

Could come away with some curiosity about is that if there's a formalism allows you to abstract certain, if you, the learning whatever this percentage in this particular topology, is due to the weights, this percent percentages do just the connections. If there's a way of actually saying you know, I'm ganna, do it all with one more or less all with the other, but is? Is their way of manipulating these things as to abstractions toward.

B

Neocortex would be almost hundred percent structure.

F

E

And by the way, that's right, so that is a common.

E

Fact, because synapses don't support ways, synapses aren't unable to support even.

F

E

Or two new years of precision they are like binary and the only thing you can do is you can turn them on turn them off grow them get rid of. So this is an observation many years ago that it's fundamentally inconsistent with the idea of weights in the minimal. It's a very, very small amount of tuning. You can make light minor, so one.

F

Way, you can say well make it I thought the best.

E

Way to build in networks make the best way to building networks is to go with weights and for synapses, and that may be true. But from my personal philosophy, point of view is I found that if you start exploring the space of possible solutions for intelligence, you just run into morass and never make progress way. If you could constrain yourselves to the biology and just accept that's the fact, then you won't get lost along the way. You won't say you won't go down a rat hole where you're exploring different activation function.

E

You used to say I know, there's a solution in the brain. The brain is constrained by this. Let us completely understand how the brain works first and then we could ask ourselves after we understand that we could say: oh well, could I improve upon that. Can I improve upon that? What nature's discovered, how to build intelligent brains by adding something else, but in the meantime you do that.

E

In its stated mission since the beginning, that's a sort of a philosophy, point of view and it's been worn out.

E

That's the founding principle here at.

B

The gate on that note sure.

E

Okay, guys I'm going.

A

To turn off the stream.

D

E

A

Thanks for the meeting it.

B

A

Very relevant yeah, okay, you're offline now.

A

A

So you guys, thanks for watching so one of the main things here that I want to. Well, then I gotta turn this off or all I can hear. Is them? Thank you so wait. Agnostic neural networks are similar to how the temporal memory and HTM systems is wait. Agnostic. This one.