Numenta HTM Community, 19 Aug 2017

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: HTM Agent Demo

Description

This video is the visual supplementary material for my thesis. It describes an NPC architecture which combines Hierarchical Temporal Memory and Temporal Difference Learning Lambda. Thesis links below.

Thesis link: https://www.dropbox.com/s/jguh4d0863y6x1r/10164132.pdf?dl=0
Discussion thread: https://discourse.numenta.org/t/htm-based-autonomous-agent/2701

A

Hi, my name is Ellie, and this is the widget supplementary material. For my thesis. This is a game world that the agent navigates and the purple agent is the player, the blue line. You know just direction and I'm controlling it right now, jumping from cell centers of cell center and if I go out of bounds, I responded around themself with negative reward. If I go to the portal, I respond again in a random cell with plugs a reward. So this is the problem.

A

The agents have a visual sensor done by recasting the environment, as you can see, on top of the agent icon, it's updated in real time.

A

There's a war line diagram underneath the terrain and if you modify the height values of the war line diagram, you get a scope. The train and the visual sensor of the agent shows the elevation and the landscape in general, updated in real time.

A

So, let's flatten the terrain and jump to the non-player character, so this is its visual sensor and now we're going to check out the architecture visualization of the agent.

A

So this is the user interface for it and I'm going to start learning now.

A

Yes, so the learning is started and I'm going to switch to a report. Coloring scheme.

A

Yes, so this is the real-time visualization of the agents learning and at the left side you can see what the agent is doing in a game world right now. It's acting quite random.

A

Most of the time fails by going out how to bounce, and sometimes it finds the portal, but it's pretty rare. So it's just run a movement at this point because it didn't learn anything.

A

So, at the right side, you can see a pleasure graph of the agent, the FO values. You know positive rewards and lower values. You know negative rewards and the one below that you know the average award. So we can wedge allies the synapses. These are proximal synapses and on top to be distal synapses, we can do them separately. These direct people will be open them all either the synapses that lead to activity in the architecture- and these are visualized in real time.

A

So we can stop at any point and try to frame by frame I, know I'm, going to load a previously large architecture that produces a lot of voluntary behavior navigating to the portal.

A

So this synaptic connection has produces a lot of voluntary behavior denoted by these neurons. And if you look at the pleasure trap, there are a lot of positive rewards with way fewer negative rewards.

A

So we can stop at any time and it's Rachel aim by frame again.

A

I'm going to disable the synapses right now and I'm, going to pick a mini column to visualize its data from from l3 actually and on this mini column, I'm, going to pick a neuron and envision as it's distal sequence.

A

There are 24 distal segment for this neuron and we can visualize their synapses with their permanence ha's or we can become any column from layer 5 and show its proximal targets that it propagates to, for example, this one yeah, the proximal targets that it propagates to our layers, d1 and d2, and we can switch the mini column and still see their proximal targets.

A

So the the architecture has different coloring schemes to communicate different information about the HTM layers.

A

As you can see, there are a lot take the default one. Yes, so, as you can see at the pleasure graph, the average award is almost point, one which is quite high.

A

If we look at the game world and if we stop an iterate frame by frame, you can see what the agent is doing. Most of the time it's navigates to the portal.

A

I'm going to disable the Ray casting race for a clearer view, as you can see, it's most of the time finds the portal pretty quickly on this sequence, though it reverses the area circumvents the area and then navigate to the portal, which is a pretty long sequence actually, but still it finds a portal and sometimes it fails, but there there wait wait for in in between.

A

So we can do the same thing at the user interface by looking at the visual Center at the bottom. So agent turns left and stays and goes to the portal there and responds in a random cell, turns right, Stace, right, right and then goes to the portal again.

A

So yes, this is this is how the learning works and thank you for listening.