Numenta Live Streams, 24 Jun 2019

Previous Meeting Next Meeting

⏯

youtube image

►

From YouTube: ReWork AI Summit Day Two Recap

Description

ReWork AI Summit Day Two Recap

A

Okay, so, as many of you know, if you've been following me, I attended the hit the right button here: rework AI summit last week and so I streamed my sort of first day and now I'm going to talk about day, two and all the presentations I saw on day. Two.

A

The agenda you didn't see it before for today is to.

A

Talk about the rework a summit, HTM forum Q&A, if I have time, I've got a 10:00 o'clock, stand-up meeting I'm, not sure if there's a research meeting yet still waiting to hear back from the research team, all of them nothing from super tired, Geoff today so see what Marcus has Marcus has anything he's not quite in the office yet so I'm, not sure. If there'll be a research meeting, and then we will have a building interesting system session, where I'm really happy with how the learning turned out.

A

I know that the last stream ended in failure, but I just needed a little sleep, I think and everything goes um everything's all good and actually the connection distribution diagram looks great when you add learning, because you can see the connection is changing over time, which is I, didn't think about that honestly and it just was very serendipitous.

A

Yeah I was totally very tired at the end of that day. Mark okay, thanks for noticing. um So let's, let's talk about day two, let's see if I can get this, so you guys could see my whole whole screen here. So the first. The first topic was deep reinforcement learning this is mostly I. Think I did nothing but deep RL on this day, I kept running into quotes these guys I like to quote they lot. They love to relate what they're doing to human intelligence in errant ways.

A

For example, the presenter for this presentation said pretty quickly. It said in his presentation that humans don't do everything from scratch. Nobody was talking about was prior knowledge like when we come into a new situation. We bring prior knowledge to that situation. That was his point, but we do learn that do everything from scratch. I mean there's, there's genetic predispositions to the way that your neurons in your brain are structured for sure. My neck is better mark. Thank you, yes, 100% 100%, but those that all that prior knowledge is something that we learned from scratch.

A

So there was a lot of talk about fryers in deep RL and a lot of people depend on priors and that's something that we can't depend on if we're gonna build, truly generalizable intelligence right.

A

So this was about priors all about priors in robotics, specifically so they're trying to incorporate priors to find sort of the sweet spot in the policy parameter, because if you start your hyper parameter, search in a bad place in the search space, you'll never get to optimum parameters, because if you just search around and you're, always it's always less sort of flat. You know if you want to look for there's these sweet spots and these in the parameter spaces.

A

So he is trying to incorporate human priors, which use which they're calling residual reinforcement learning so that they can better choose. I. Think where to start searching in that space, I asked the question at the end of this.

A

We do have a fire engine, we always have to stop with a fire engine, because it's disruptive.

A

A

First thing: 9:00 a.m. fire engine. Okay, so I asked a question. You know cuz there's this theme in day. One, certainly that we need that a realization that RL needed to move towards generalization find a way to better generalize, and so this is moving away from generalization, he's sort of balked at that, and didn't really admit that this was moving away from generalization but said well.

A

The priors can be applied if you, the priors, are very general, meaning that they're about objects or something and but when I asked him well, if the priors are general, can they be applied to different, robotic situations, and he said well, that's something we'll have to research. So this the incorporation of priors, definitely seems to be movement away from the area of generalization. So anyway, the thing the technology or whatever he was talking about. It's called I lq g, which is model-based residual, RL and I.

A

Don't really wasn't too interested in what it was, because it was definitely very specific about one thing, so, okay, next, the next presentation was interesting because it was about space and exploration of space and stuff. So there's a company called offworld, you can, let's see they have a website. That's honestly.

A

This was that there were some details about this. Let me find the website offworld a I, so great vision, okay for wow. This is what they're going for they want. They've got a huge pie in the sky, vision about exploring other planets mining, specifically they're, focusing on creating robots that can mine on earth, and so this is something that Jeff has talked about. You know Jeff Hawkins of in it's a of a potential, really big application for truly intelligent machines. We're talking you need a GI for this first. You definitely need something that can make.

A

Decisions can learn all by itself without some round trip back to earth or back to some servers to do retraining after a new environment is, you know, observed so a huge vision, certainly a huge vision, very ambitious, but but definitely a big marketing pitch. It felt like to me I, don't know I mean there's they have investors there they're doing research, they're focused on mining on earth. They have these prototypes of diggers and crushers. Oh sorry, they have bit kings talking about priors he quotes. This is back to the previous presentation.

A

I believe that everything you learn is framed in parts in terms of what you've learned before you learn. Delta coding, based on your prior life experiences yeah, so I totally believe this, but but this is all about hard coded priors for each specific situation and that's that's the movement away from generalization and I was talking about. It. Definitely wasn't: transfer learning it's from what I could tell.

A

Wouldn't it be more like insect level thinking rather than creative thinking? Oh, you mean as far as the logistic stuff, no I, don't I, don't know, I think when you put when you go put these things out in the environment of Mars or whatever you're they're gonna run into situations that we never anticipated and they're going to have to adapt and learn or die. You know so I wouldn't I, wouldn't say it's more than just insect level. Thinking they're gonna have to do problem.

A

Solving they're gonna face situations that an insect would not be able to solve. I think yeah. It totally sucks to be stuck with those shackles is the deep learning shackles. That's that's the world that they certainly operate within and I'll have some sort of sum up. Thoughts about this conference at the end and I'll talk about that a little so about the mining thing they did so they actually have these diggers and crushers that they're testing in real ones. Here's a here's sort of an example. Okay!

A

So there's one of the diggers- and this is what this is where they're at right now they've got a robot arm and it's and it's got- you know a mocker. You know a chisel essentially and they brought in like an expert Mason that they had chisel and show them like the best way to do the chiseling and they used what they called imitation learning and so the so where they're at is they're trying to maximize the amount of mass that's being extracted with each attack each chisel attack.

A

So that's that's what they're working on right now!

A

So it's there at the very beginning of this.

A

Now there they do have they're trying to do navigation type stuff, but from what I can tell from the rest of the deep learning presentations they're so so far away from being efficient at this you know, so so a lot of these deep learning systems start with humans in the loop yeah. You could be right, but it could be right about the insects. I don't know, ants are pretty smart answers pretty smart anyway.

A

So though they'll start with a human and the human will either operate. Hi Jeremy the human will either.

A

Operate the robot and to give it an example of what it should be doing sort of like a goal, so it can have a sort of jump start and then and I think that's what they did here. They they had a professional Mason, come in and do the chiseling, so they the one thing about this is that I thought was interesting is that they do all their testing and training in the real world, which is different from a lot of other systems, because they they can train so much faster and more efficiently in simulations.

A

But then those agents when they transfer from the simulation into the real world, there's a big problem there, because the real world is not a simulation and there's just so many unexpected things things. You cannot model just about physics about reality that exist in the real world and the agents fall flat. They they can't there. Their training does not transfer and most in most cases, I would say simulation to real-world. So the interesting about this is they're. Doing all this training in the real world and they're, the reward function is all about. For this example.

A

Mass mind per attempt: they also talked about using hallucinatory ganz I, didn't quite understand this I sort of get what they are, but they were trying to do navigation with alou Centauri Ganz, but I didn't quite see where they're going with that. So in summary, I think that this company has great vision and it's cool to do to think about space exploration and that sort of thing, but it seems really far out really far out because to be successful at this, you really have to have some pretty intelligent machines and they're.

A

Really deep RL seems to be the only game in town for four people doing robotics that are at least the most promising hello disillusion thanks for joining I, like your Zelda emote I'm. Talking about this conference, I went to last week and most of its deep reinforcement learning so at we. There were some hard questions at the end of this presentation about retraining and resets and.

A

So the presenter who I think was this: the founder and the founder of offworld admitted that there must be retraining on offworld location so with the approach they're, taking with deep reinforcement, learning they're, not online learning systems, so I feel, like that's a brick wall. You know power.

A

How are they going to adjust in real-time to changing situations and environments? I'm not sure they are there are. There are certainly cases where machine learning discovers surprising solutions, and we saw that and when I talked Friday about alpha star and about how they, the tactics that alpha star had against humans surprise the humans they did not expect them to do that, and even in the go alpha go again that really surprised the the go players that the tactics were unexpected and hadn't been seen before and now go players are using those tactics.

A

So we humans have learned tactics from this. These AI systems that we didn't know about before, which is totally cool.

A

Okay, next presentation and chess yeah, absolutely I, don't mean to I, know I'm talking about deep learning and I and I am sort of tuned to find the deficits in deep learning, because I'm coming from the brain side, I don't mean to come down on it, but because there are so many cool things you can do with this stuff, but but I certainly keep seeing the brick wall you know over and over again in all these presentations, okay, so this was towards simplifying supervision and deep meta and reinforcement, learning algorithms, so they're they're, focusing they're trying to apply to unstructured environments and and I've heard this a couple of times on Friday unstructured means basically what that we would have like on Mars or on some some environment.

A

That's not controlled, it changes or it's unknown, which is good to try and focus on those environments, because those are the environments that we want to have intelligent things operated today we don't have anything that can operate in unstructured environments. So we talked about what the problem was. One of the problems is creating a a reward function, deciding where the reward comes from.

A

When you have an unstructured environments, if you don't know what environment you can be operating on, how do you know what to reward the agent for what actions to reward the agent for so coming up with these ROI reward functions is a hard problem and expensive because training, training, training and training these things have to be trained so much, and that is expensive. It's all compute time and diversity was a big deal at this conference so needing to learn.

A

A diverse range of skills is important and the only way that I can see that the creators of these systems add diversity is by hand coding the diversity. So that means lots of supervision and trying to learn faster with less supervision is one of their goals, but it's hard to add action diversity without that supervision, and even in those situations Mark Brown, you mentioned that they come up with some interesting solutions to problems.

A

It's the diversity, I, think that creates those interesting solutions, because humans have to sort of inject ideas into these agents like tell them, do up a lot of this and then do a lot of that or something and and then once they have agents that have certain strategies and then they play them all against each other. You know that's one of the things. Deep learning can do they don't it doesn't need a ton of data, not deep learning, deep reinforcement.

A

Learning you don't need a boatload of data like you do for deep learning networks, like millions and millions of images, you can create these agents and then play them against each other, so they sort of create the data themselves, but you do have to inject strategies or that's what they're calling diversity different strategies, different ways of being rewarded for different actions. You know for each agents and then, as these agents that have different strategies play each other out, that you find agents that these this emergent behavior sort of comes out of that diversity.

A

That's which is like combinations of strategies that that haven't been tried before yeah diversity is huge and everybody, and it's such an interesting topic, because diversity is important in life. It's just as well. You know you, as we all sort of know, I think Mark said, is such a different way to work, as it only stimulates a single aspect of the brain. That applies simple point neurons, but the thing being emulated is the layers two parts to suck problems, space and manifold topology yeah, it's very hard to visualize these things.

A

You know, I note that HTM is backed away from this aspect, so it may be too soon to poke fun at DL I'm, not trying to poke fun at DL for sure I. Think deep learning is very cool, but you know I'm more interested in brains and and you're right.

A

There's, there's a population effects, hey thanks for the follow Sam Griffin I've watched your channel quite a bit always rating thanks for the raid appreciate it I'm talking about deep reinforcement, learning as I went to a conference about deep reinforcement, learning on Thursday and Friday and I'm going over. Some of these presentations that I saw there awesome so we're talking about diversity right now and deep reinforcement learning, because it's very important to have diverse.

A

There's a cop on it for somebody to see that motorcycle, that's a cop, so meta RL is leveraging prior experience to try and quickly learn new tasks. So it's I think it's sort of a way of transferring the learning that you've you've had in other environments to learn new tasks, but it still requires supervision and all of these tasks, even in meta, reinforcement, learning, still need reward functions.

A

Here's just a peek into some of the math I did not take pictures of the math at this conference because I'm not a huge I'm, not a mathematician I'm, not really great at math, but all of this ton of these these things. These presentations are focused on robotics, so towards supervise the meta reinforcement, learning, there's still going to be a human in the loop. In this example, you need a human trainer and they even talked about language in the loop, through reinforcement, learning and there's a couple.

A

Presentations too talked about this, and this is about like having basic language understanding, so you can give it to a man. Thank you for Chuck for the follow, appreciate it. If you and things like move to the left a bit or move up a lot or stuff like that, if if you can have basic commands like that- and this is sort of the human in the loop, it gives you the ability to have less intervention because you're, not necessarily retraining.

A

You can adjust the agents behavior by giving it a few small commands and then it will get its rewards and sort of learn from from those adjustments. But you still have to have a human labeler like move closer to the green triangle, so we can understand, commands like that and then the agent can respond and once he gets a reward when it gets to the green triangle. You know it gets a reward for sort of higher-level commands, which is the tasks that the agent is trying to call mark says.

A

If we did do the hierarchy with HTM, it would not be reward functions. This will be breading, some serious new ground and huge new potholes to step in yeah, I, agree, I, think the hierarchy and the cortex is not when they talk. They talk about hierarchy and there's another presentation. I think the talks about hierarchy and deep reinforcement, planning and I don't think it's the same, the same thing as the as the hierarchy where we talked about in HTM okay. So this supervises the matter.

A

Ria RL, when you have a human in the loop, it works better than just straight rewards, but it's more. It takes more effort.

A

Unsupervised tasks, reward extraction from the environment, I think you just mentioned that I'm, not sure. That's a thing, oh here it is how so all right. So they start with like random reward functions when they're talking about unsupervised, meta learning, let's see if we can parse this slide here so for unsupervised, meta learning.

A

Let's say you have your environment that here's the meta RL loop, so the meta learned environment, specific algorithm uses some reward function, has some fast adaption but I think less overfitting to tasks distributions random reward functions. They start off with random reward functions and they sort of evolve over time and again they are a diversity driven I, don't know how that works.

A

When your diversity driven using random reward functions like I, guess you can, you can reward I, don't know how you identify a diversity and random random reward functions, but maybe again, but you still have to dat's how you do the verse. You start with a predefined set of tasks, so those tasks could be the diversity where you're using diversity.

A

So closing the task proposal loop.

A

Yes, but it's all about priors a lot of this. A lot of this stuff requires priors that they can't get anywhere without the priors, so the takeaway from this presentation was that supervision is extremely important for solving deep reinforcement, learning problems, and this is sort of a common theme. A lot of the takeaways from these presentations were like. We need priors, we need supervision, we can't generalize.

A

This is unfortunate. Okay, I'm gonna, move on to the next one. This was a face book. A I talked a I research about lifelong learning and robotics. So same goal is a lot of the others.

A

Disillusion has some pliers pliers priors.

A

So everybody wants to have autonomous, robots, right and everybody's looking at deep reinforcement, learning for the answers so and they're, assuming that this is the way to get at ominous autonomous robots is it's I mean for most of these companies? It's the only game in town, it's the most impressive form of deep learning that we have at the moment. They're all referring back to these big milestones like alpha star alpha, go all the games that they can play and beat, and everything and again this this woman made a reference to how children learn.

A

I, always I'm always very wary about about that, but anyway, humans remember their errors. So here's the the whole point of this presentation is that humans remember their errors and use them to learn faster. So how can we get deep? Our ell systems did the same thing, so she's barking up the right tree here right learning how to learn is is super important for autonomous agents and she's trying to teach these robots how to move and the effects of their actions by separating out the errors that they've made and and transferring those learning rates.

A

I, don't know the the total details behind this, but somehow, if you can take as an agent operates, you can take its learning rates over time, which incorporates the errors that have occurred and somehow apply those learning rates to another environment that it goes into.

A

So you can separate that model with that memory of errors like says right here or learning rates, and then, when you go to a new task, you take that memory of all your heirs with you and apparently that's easier to transfer you're talking about learning transfer, so they're they're experiments where we're trying to do that sort of thing. But there are super simple experiments like a robot.

A

So this is the task they're talking about a robot picking up nothing, that's one task: a robot I, don't know how you get a reward for that, but a robot picking up a light object and a robot picking up a heavy object. So this is like a super simple model, one model per joint in the robot arm or whatever, and basically transfer learning that includes learning rates that which is which includes the errors and learning rates or the errors and and try and transfer that to the other tasks.

A

So they successfully took errors from a robot, picking up no object and applied it to picking up a light object, a heavy object and essentially it learned faster than starting from scratch. So that's that's what they did. One takeaway and I heard this over and over again is online learning is really hard. That's a direct quote.

A

Ok, so here's some more math math, you stuff, you want to look at it. So here's here's the lot about learning the loss function. So you have to have this differentiable framework high dimensional and she says both metal loss and learning rate are learned. I'm, not I'm, not totally I. Don't totally understand what metal loss is I want to talk about meta stuff.

A

Let's see you still here's here's one thing: you still have to provide a task loss function. This is human coded. You have to create this yourself. You can change the policy with more layers or other architectures yeah, but they are just starting on this. This woman was very excited about this, but it's it was a very simple experiment. You're just talking about robots, just lifting lifting one thing up and down: that's it. So that's I mean this. That's state-of-the-art.

A

Okay, next one continually evolving machines learning by experimenting. This is there's a lot of research scientists from Berkeley presenting here, so the idea was again to equip reinforcement, learning agents with prior knowledge. A theme at this conference certainly was starting with prior knowledge. He did ask some interesting questions like this is the first person to refer to object representation in any way that I could tell. So he asked the question: what what is an object, but what do you have to understand? What an object is and then from what I could tell?

A

Did it really answer it? You know and again there's this reference to infant learning. Saying that play is a form of experimentation, absolutely spot-on, you know, and our brains were pre-wired were wired to enjoy learning and exploration. That's the thing that we have to add to these agents. If we want them to be smart and to learn, we also imitate other other agents, which is, which is true a lot of talk about imitation.

A

So this is at its core trying to address the problem of object, representation, yeah the objects, the collection of perceived features. Let's agree with mark there, but that's not what anybody at this conference was how anyone at this concert was referring to objects they were referring to.

A

There is really no as far as I can tell I, don't they don't think about it in the same way, absolutely so trying to let's say the inverse model of action prediction instead of pixel prediction: oh right so usually in been in these robotics deep RL systems, there's a camera and a robot right and and the camera, the pixels and the camera are what is fed into the deep reinforced employment system just like when they do the Atari games. They feed every pixel into the system.

A

Every one of those pixels is a part of the state, so it's the same thing with the camera every pixel in that and that camera frame over time as part of the state and that's what the system learns on and the interesting thing about this is that they are trying they're doing this sort of inverse model of of using the action prediction as the state instead of the pixel prediction, which I thought was interesting.

A

Also said, random exploration is limiting because the actions should increase exploration. So that's a good point. If you just start with random actions, you're you you don't explore the whole space. You know you end up just sort of a little bit of what you want is semi random action, pseudo, random actions that that can optimize on exploring new space right. That seems to be the right way of doing it. So they came to the same conclusion, which is good driven by curiosity, which is that's a good concept.

A

That's so this is interesting, so they are actually training agents to maximize his prediction error, because that tells them that they're in a space that they don't understand. So that's that's, that's an interesting way to think about it, and, and they could they call this in a way. It's like self self supervision, because you're incentivized to go places that you don't know and experience things that you have no model of right novelty, see yes exactly, but there's too much to learn.

A

So you need to use imitation, so they again going back to priors imitation is it is about prior prior knowledge, at least to the deep RL world? It's about priors. So it's interesting that you can treat prediction error as a reward function to train the policy, that's kind of a cool idea. So that's why I took away from that one. So, here's another Facebook, hey I research, called habitat a platform for embodied AI research. I was a little bit harsh on this. He didn't have a completely wrong definition of embodied AI.

A

But from his perspective, his embodiment means in a 3d world. Okay, I, don't quite agree with that. I think you can be embodied as long as you can take actions in an environment and and perceive the changes to the environment based on those actions. I think that's embodiment, I, don't think it doesn't it's not about a 3d world at any dimensional world, it's just about a world and a self and act and motion or movement in that world and the perception of the changes to that world. You know, that's that's more power.

A

So the habitat thing he was talking about was a project that he's been working on. One of the things I wanted to look at was this matter: port 3d, if you're interested in 3d environments is for for exploration for agents. This is really interesting because they have this technology. Now, where you know, people will come in to your house, for example, and those and they'll set up things all around the house and take all these pictures so that you can basically fly through your house and get this super-high resolution.

A

You know a graphical interface of your house like this is one of the pictures of someone's kitchen and you can go all through it and all of these objects are are tagged, you know, and- and you can interact with them well, not eventually I mean his system. It creates something sort of on top of that that can interact with them. So they're trying to create like an image net for embodied AI and they're, calling this habitat it's a 3d, stimulator and I was confused cuz.

A

He said so you kept talking about how, like the resolution, is so high. So like unity when you're playing a video game when you humans play a video game, the refresh rate that are they pay attention. So thank you for the follow FP jest uh when, when humans play video games, the refresh rate that we know it is I think it's like 25 per source, something like anything or no, it's 60! Sorry 60 is right there on the slide.

A

It's 60 Hertz anything over 60 Hertz is we're not gonna even notice right humans do not process input that fast, so, like unity, I, think runs it like 25 Hertz or something, and it looks fine to pretty much everybody, but it's at a higher resolution. So what the machine would rather have is a smaller resolution, but at a faster refresh rate.

A

So I was totally confused by this because I'm like if we're talking about Hertz as far as refresh rate, why that just seems like a ton of duplicate information, but he's not talking about actual perceived refresh rates, he's talking about how quickly is system. Thank you for the follow I. Don't even care. 12 I appreciate it talking about deep reinforcement. Learning a conference. I went to last week they're talking about how quickly this system can update, so that a deep RL system can learn fast.

A

So it's not about how quickly or the amount of information coming in per second for it, it's like how how fast can the agent you know trained? Essentially how fast can a train.

A

So yeah, so it's not really a refresh rate but a rate at which the agent can take action. So this is really important, because all these agents are lots of these agents.

A

They're all training on these big cloud, computing systems- you know Google cloud or AWS or whatever so they're operating way faster than the speed of reality or whatever that we operate at like I said if you watch my video from from Thursday night alpha star, which is which is the deep reinforcement learning system that there was actually LST M with a bunch of other things, but alpha star, which is the system that beat the grandmasters in StarCraft to 10 out of 10 times that agent, the agent that they ended up with that beat those those grandmasters trained for over 600,000 years, which I'm still so blown away by.

A

That's amazing that there's that much compute imagine the carbon footprint of that. It's just amazing. Okay! Next up this is a this was a dance conference, man, another Facebook, AI researcher. So this was all about the sense of touch which I thought was cool. It's really the first person the only person at all talking about the sense of touch so he's emphasizing forces when you interact with the world- and this is something that most of the other people doing.

A

Robotics are not even paying attention to I mean, there's, there's torque, you know and usually they're just hard coded things like don't squeeze something too hard, but he's actually got a really interesting sensor that they've created most of stuff. We already know you know humans create models of the world but robots. Don't we he's trying to use tactile sensors right, so the one thing is called gel sight, which is this interesting sensor, but it's all of this was basically a presentation about this gel site. Tensor.

A

You can it's hard to see it, but this is this: is it right here there's two of them on either side of this little thing and and this is sort of what it sees. So this is an interesting little technology, because it's it's so imagine that you've got. This is like the gel sight sensor and you put a camera inside of it and it points up right. I could probably draw this better.

A

So on top of that cup, you put this gel all right and then you put a grid on it like you actually draw a grid on it, and then you have- and it's very plastic. You know it's very flexible. So then, when you, when it touches something plus, you can sort of see here where it says gel site left and right. These are the impressions left on that gel by by where it's touching like this pitcher. So you can see a very appointed like this is the right one where it's hitting the point?

A

That's and you can see it's a lot of pressure and this one is sort of more rounded where it's touching the handle. So this is super cool, I thought, because um because you get a lot more, you get the tactile information out of it. So there he was working on creating robots that could better understand and incorporate these types of tactile responses and- and it helped with grasping. You know like one of those visualizations where he'd show a picture of two robots.

A

That, essentially, were grasping the same thing and you say which one's going to fail and which ones not because you can't tell, and neither can the robot from the picture, whether it's, whether it's exerting enough pressure on the thing for it to actually hold. But with something like this. You can tell exactly how much pressure is being exerted. So that was interesting. Presentation.

A

This one got cancelled: okay, autonomous driving decision-making with deep reinforcement learning, so this was about urban driving, and this was really.

A

This really is a window into where we're at right now, as far as autonomous driving, because they're so he's working specifically on situations that you come across in urban driving, almost all pretty much all of the autonomous driving vehicles we have right now are highway only they they can't make too many decisions. You know it's basically, you know navigate on a highway. Setting, don't drive it on the off roads, don't drive it through the city. You can't do that sort of thing, so they're trying to tackle these really challenging environment.

A

Urban driving is hugely densest when it comes to sensory information and all the decisions need to be made. So we're talking about very high dimensional data- everybody's everybody in this space, which this is what he said, is using a hard-coded modular approach.

A

So the current current approaches, an industry he's saying, is very modular, but all of these boxes are hard-coded. There is no generality, and so that's sort of the the challenges at the bottom here or accuracy, generality and system complexity.

A

So, in order to adapt to these constantly changing environments, you have to constantly optoma optimize the the DRL policy and again he talks about imitation learning, which is Prioress essentially, but even with imitation learning.

A

You can't learn skills if there's, if there's no representations and the training of those situations right, yeah, most vision systems and motion systems are currently hard-coded to so deep reinforcement learning just needs a simulator, so they're doing all this in a simulator, but for urban environments, deep RL is still not very successful and, and the reasons he's pointing to once some of them are that the there is still using outdated reinforcement. Learning algorithms you got to remember deep RL is only like five years old.

A

That's super new, you know in this world and he says: there's new algorithms being published in archives like every month, so it's very hard to keep up to date. This research is super expensive. All the training that's required is super expensive too.

A

So let me think I'm trying to figure out so so they're use their experiments here use this lidar, and this is sort of an example of the image that the lidar gets. They actually put the lidar like way up on top of the car, so it can look down and get a better perspective of it and it tries to differentiate. You know things that are at up high versus down low and try and pull out.

A

You know: dimensionality essentially reduce the dimensionality so that the RL agent doesn't have as much to process so they're using a variational, auto encoder reconstruction pipeline. But there's still nothing like nothing similar to object. Understanding in this it's it's still, you know probability distributions etc. So this is sort of what they're trying to get to they have a map.

A

They've got objects that they've detected where like an egocentric state and then a route, and then they there. This is basically that what they're working with is bird view image, that's what they're trying to reconstruct and make decisions off of, and this is sort of what the lidar looks like. So the the input up here at the top and then they reconstruct it I, don't know exactly what the difference between these is, but essentially they get to this mass down here.

A

So they take this and they process process process so that they can reduce the dimensionality. So the reinforcement agent doesn't have to deal with all of those bits, so they're not just feeding the white our data straight into the DRL. That would be way too much way too much so they're doing a little spree processing.

A

Yes, some there's one of the presentations did talk about slam, I, don't remember which one, but it was as a comparison. I think so. They've got all these model free algorithms, they're trying and they showed all of these I did take pictures because it was just math, math, math, ddq n, which is a common one, deep t3 and soft actor soft actor, something soft actor self critic soft after factor critic.

A

These are like some of the common, deep reinforcement, algorithms that are around right now, so Carla is the simulator that they use. That's the name of this, the car simulator Carla, which is fun. He said they did Anna simulator because you wouldn't want to run a real RL car in the real world so which emphasizes none of the self-driving cars currently are using reinforcement learning. This is not a not a technology. That's currently deployed, there's only one place in this whole conference.

A

That I could tell was something was actually deployed in reinforcement, learning in the real world which I'll get to so what they tried to do was the roundabout I think I have picture of this, so the goal was to navigate a roundabout. That's what this whole presentations experiment boiled down to how to navigate a roundabout, and it was basically into the roundabout, ignore the first exit ignore the second exit exit on the third exit.

A

Don't hit anything so they're rewarded for progress getting to the first exit getting to the second exit and then finally getting passed out of the roundabout, so they would penalize for collisions. They reward for progress through the roundabout and they tried this with and without the surrounding vehicles. They found that the soft actor/model worked best, but it was. This is a very restricted experiment.

A

It was just one example of a roundabout and they didn't even they just went like sort of 3/4 around the loop and there's no noise at all in the simulation. So with people reinforcement learning they were able to navigate a roundabout. That's that's what this was about.

A

Ok, another robot won. I think this was the one that actually has something deployed for reinforcement. Learning. If you want to look into it, the company is called. It was sorrow, so they're really working to do: reinforcement, learning for robotics tasks and their company trying to make products most robots, they say have only a level one autonomy which means single automated operation and there's a ton of robots that have single level one autonomy and you can go look up.

A

The different levels of autonomy, I think there's five of them, five being fully autonomous level, one being just one single automated operation, so I think they're getting to like level two they're trying to get to level two. So what they're trying to do is essentially target the warehouse space because it's not an unstructured environment. You know a warehouse is a structured environment. They can basically they know, there's going to be AB in here and AB in there and there's going to be stuff over here that needs to be put over there.

A

So it's basically picking place in warehouses. That's what they're trying to offer space high net is level five yeah, so not just picking place, but also put it in the right place.

A

Let's see so these are some just some of the like robots. Some of them are suctions. Most of them are suction. It's easiest to pick up things with a suction cup than it is to grasp it, because all you have to do is put it in the right spot. You know you don't have to worry about, or so so all these that I saw in her example were suction cups, sort of collection, so they have to collect real world data. This is something you can't just do in simulations.

A

You do testing and simulations, but at the end you have to test also in the real world all of their data. Their models are in the cloud they're in structured warehouse environments. Only deep learning solutions currently deployed. Here's the one so there's this here's, the one robot that got up here on the top right and there's a video of this if you're interested in looking at it, but it's not very exciting but essentially there's a plate right and the plate has four bolts coming out of it.

A

So the task is to for the robot to put another plate on top of it with four holes. The problem is the plate might be a little oriented differently, so the deep reinforcement algorithm puts it on there matches automatically the holes, no matter what orientation that the plate is in and puts it on.

A

That's the one example of deep reinforcement, learning in the wild that I saw from his conference, but at least there's something because I didn't think there were any robotics applications that were using deep reinforcement, learning out in the wild and there's there are says: Rhino Delta with watch a delta with grippers running at blured speeds, driven by a vision system.

A

I'll have to check that out.

A

Okay, I need to go fast because we will have a research meeting in about 20 minutes. 40 minutes, maybe and I have to be at my standup soon so I'm almost done here ya can deep reinforcement, learning improve inventory management.

A

This was a logistics application which was interesting, not a robotics application, and it seemed to be somewhat successful, and this is basically supply and demand. I've got all of these centers that supply product and all of these centers that need product and keeping in keeping track of the best way to route the product. To these centers at the right times is a is a big problem.

A

Do you go fast and expensive, which is like shipping over with a truck, or do you go slow and cheap, which is like a train which it gives you more so there's there's big logistics problems and there and this person who's was in more the operation. Space was applying deep reinforcement, learning to to try and decide when to replenish stock to the different systems and what orders to make, and he seemed to be having some success with that.

A

So I thought that was that was cool okay I was I was I was interested in this because the word reason was in the title, learning to read and summarize and discover with deep RL. This is from Salesforce, so they, let's see he went through these three different case studies. I lost interest in this when I realized. That reasoning was a was a knowledge graph.

A

So really when he said the the way he wanted to find the way they defined reasoning was having a pre-existing knowledge graph, which is a prior prior information, could be a very deep knowledge graph and reasoning is basically navigating from one node to another. You know so asking the question: what what directors has Tom Cruise worked with, for example, that so that's, so the reasoning would be okay, let's go to Tom Cruise and listen to all the movies.

A

So the first step is finding all of the movies are all of the projects that he's worked with and then the next step of saying for all those projects who is the director and then making a list of those things. So that was the type of reasoning that they're talking about, which is navigating a knowledge graph.

A

Sometimes these knowledge graphs have missing links and what this deep RL system does can guess the relationships without those missing links, but it does need to be sort of hard coded with this path, diversity, which is part of the knowledge graph.

A

Okay, that's it okay, so a couple of final thoughts and then I'm gonna take a break and I'm gonna prep for my stand-up meeting and then we're gonna have research meeting. So let's do this, so this is worth going to because just for my own education I know a lot more about deep reinforcement. Learning today, even though I have never created a deep reinforcement, learning system, I think I understand the nomenclature a bit better, so it was good education.

A

For me, I was wondering what the state of the art was currently in deep reinforcement learning, and so it was a good sort of indication of what the state a art is. No one seems to have any illusions about what you can and can't do, with deep reinforcement learning. When you talk about AG I at this conference, everybody thought AG. I was a long way away. This thinking about de Bardo and from the standpoint of deep RL AGI is a long way away, which I would say no I agree. There's a ton of challenges.

A

The major challenges are generalization and training, speed, scalability right, the that's the things that everybody acknowledges are the major challenges for deep RL and and no one really understands how to attack those problems. They know that they we need benchmarks and so they're working on benchmarks and there's a lot of and people are working on lots of different ways to attack those problems and a lot of the tactics that are that are being applied today involve prior knowledge, so which is goes against the idea of generalization.

A

So one of the ways you can try and decrease training time and increase scalability is by injecting prior knowledge, which is completely the opposite of what you want to do to achieve generalization. So there's this dichotomy here and everyone agrees that this is a hard problem. There's a lot lot of hard problems that need to be addressed in the deep reinforcement, learning, world and and everyone I think understands that.

A

That's the case, so I don't think, there's many illusions about it and there's a lot of I hate to say nostalgia, because this is just like just all these breakthroughs just happen, but there's a lot of looking back at the big breakthroughs from alphago from from alpha star from all of the different gaming environments that deep RL systems have conquered recently Montezuma's Revenge.

A

You know there's a lot of references to that like if we can do that, we can do anything right sort of thing, but the problem is that just doesn't transfer it's very, very hard to take those very hard coded solutions in those specific environments and and transfer any of that learning into environments that are even a little bit different, even a little bit different. That's that's a big big challenge, so so that's the recap of the conference is totally interesting. If you're I, like the conference organization, rework is seem to do a good job.

A

Putting the conference on so I gave them kudos, as I was going out, the door told em thanks for the great conference, so that's where we're at and so I'm open for questions or comments. Now I am going to go back to my title.