Numenta 2015 HTM Challenge, 30 Nov 2015

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: Column Swarm Reinforcement Learning - Eric Laukien

Description

2015 HTM Challenge Submission. 2nd Place Innovation.

A

Our next submission is column, swarm, reinforcement, learning by Eric lochia who's going to join us via Skype. As soon as his demos over.

B

A

Okay and we have Eric obvious skype right now, so thank you, Eric for your submission good to see you so let me first say eric has been involved in our community for a while. Now doing some really interesting, cutting-edge integrations between HTM theory and other machine learning techniques that he's been investigating so he's he I would call him one of those mad scientist who's just taking this into areas that nobody else is exploring right now and I commend him for that.

A

So thanks Eric, and with that, if you don't understand what he's doing I'm in the same boat as you so I'm, this is my head I'm wondering if there's any any comments from the judging panel about Eric's work, it's deep.

C

I have a question it like Matt's. A lot of it is think beyond my expertise, but it I was curious about what's going on with the the animated robot it looks. It reminded me somewhat of a like genetic algorithms, words you're testing out something see if it works. If it doesn't, if it does, you keep adding to it, and is that the that, what's going on there.

D

Sort of it's, it's actually reinforcement learning, so it's an attempt, I'm reproducing the way humans, learn, basically uses a sort of HTM like hierarchy that predicts its next actions and then their little reinforcement learning units at each node. These are my malls of the column, which is basically each column, is a reinforcement learner in my model, which might not be realistic. But the thing is the reinforcement. Learners can act like a real column. They can learn how to be a real column.

D

So these little reinforcement, learners, gate the predictions and the attention of the predictive hierarchy to control which which actions it will take, and it does this off of a global reinforcement signal, but they can also pass local signals to each other to award subtasks. Okay.

C

But there's some sort of motivator there too say try to go in this direction. Is that right? Yes,.

D

There's a global would signal that just right now just go to the right. Okay,.

E

I have a follow-up question on that I think at the beginning of the video you're right that you use Hugh learning as one of the layers, the reinforcement learning algorithm that so the reward function is just going right. Then, when you show that it's going backwards, then how does that work? Is it also.

D

I just reversed the reward function.

E

Okay, so you just know: that's the new reward function. Yes,.

D

Then it learned to adapt to that.

F

So I have a question: is that I'm trying to stand the overall motivation for the work I mean you could be saying: hey I'm, trying to combine HTM, but the particular other types of learning, algorithms or you're you're, saying no I'm actually trying to move HTM in a different direction, or you might be saying no I'm trying to solve a problem and I'm using a mixture of tools to do that. So it wasn't clear what the overall, what you wrote about motivation is and how that would help us to interpret.

F

You know how to think about what you're doing.

D

Well, ultimately, the greatest motivation is a GI, but which is basically my opinion is mostly on supervised learning and reinforcement, learning and so I'm trying to use H gym, which is the most hei ish thing.

D

That I was able to find so good on that and combined it with some standard deep lines up to really eke out that bit of performance to to the point where I can make demos and show with actually being able to do stuff, like reinforcement, learning and ultimately, I want to also do is reproduce what deep minded, but using HTM like the the Atari demo.

D

So I'm working on that, I actually just now created a GPU version of the algorithm of just the predictive hierarchy, not the reinforcement, learning it and I adamite tested it yesterday and it's able to predict stuff like HTM it uses, unlike HTM. It uses like some different, sparse coding, algorithms, with explaining away properties, but uh most of the idea is still the same way of the body. Directional hierarchy that you strike, features and stuff upwards and predict downwards.

G

What one kind of technical question I had on your thing, you said you had a single column model running those with the car demo. What is the, what is a column in your architecture and how does it relate to the HTM columns? I wasn't quite clear on that point. They.

D

Say my columns are supposed to learn how to be HTM columns. They use reinforcement, learning because I don't know exactly how the HTM Callum's work to be honest or how real columns work so I, just basically kind of winged it and made a reinforcement, learn that tries to mimic a real column through Ward signals. So it can. It can learn to approximate actual columns and it's able to gate attention and predictions and everything.

G

What what makes is it it's a collection of nodes? Is that yes,.

D

Each column is is like a little mini SD, our former, that takes local input around it's from the real big STRs and forms this little local representation. Then it uses that to perform q-learning like efficiently compress the space around it. So.

G

The output of a whole column then becomes a single signal, let's say for the car to go left or right or what.

D

They actually can then my implementation. They can have as many actions as you want, but for the hierarchy, I've attention gating the prediction learning mostly just just to basically, but you can add more as well.

H

Question how fast were you able to get the?

H

Was it a robot? What do you call it thing? Yeah.

D

These are supposed have to be like big dog. How.

H

Fast, were you able to get the big dog to go and how fast is it capable of going and what's the what's to prevent you from getting to its maximum speed, using the technique of HTM that you've used.

D

Well, I haven't really run it, for that long is busy. Usually learns. How to get to decent speed within a few seconds is I. Have the speed up key that I pressed in the demo, so it I don't really know what the maximum theoretical boundary is for this problem, but it's obviously limited by the environment to some extent, but I have a demo where I tried like, for instance, increasing the strength of the motors, and then they basically sometimes learn to glitch out the physics and just fly all over the place.

F

A

Questions or comments judging panel but.

C

I love the sort of freeform experimental approach. You take definitely.

A

Okay, everybody Eric walkie and thank you for joining.

A