Review paper "Deep Reinforcement Learning and Its Neuroscientific Implications" Essence Summary & Summary of Recent RL Frontiers

Review paper "Deep Reinforcement Learning and Its Neuroscientific Implications" Essence Summary & Summary of Recent RL Frontiers


I read an article "Deep Reinforcement Learning and Its Neuroscientific Implications" published on Neuron in 2020.

The article discusses the relationship between deep reinforcement learning and neuroscience. Here, I mainly check the frontiers of reinforcement learning mentioned in the paper as a reminder.

Overview of Frontiers

  • Representation Learning
  • Model-Based RL
  • Memory
  • Exploration
  • Cognitive Control and Action Hierarchies
  • Social Cognition

A quick overview of deep reinforcement learning

The so-called "deep reinforcement learning" refers to solving the problem of reinforcement learning with the help of deep learning. The introduction of deep learning allows reinforcement learning to solve more complex problems, and the stability of the algorithm has been greatly improved.

Reinforcement learning itself is summarized from biological behavior. The "reward-prediction error (aka. RPE)" in reinforcement learning is largely equivalent to dopamine, which encodes "desire" in biology.

Representation Learning

In deep reinforcement learning, reward-based learning shapes network representation, which in turn supports reward-based decision-making.

Prediction learning: In predictive learning, the agent needs to predict the most likely state it will observe in the next step based on the current state, so as to complete the modeling of the potential rules of the task:
Wayne et al., 2018
Gelada et al ., 2019

Explore and learn the environment more effectively by breaking it down into objects:
Watters et al., 2019

Model Based RL

In some cases, processes similar to model-based RL may spontaneously appear in systems that have trained our model-free RL algorithm.
Guez et al., 2019

Model-based behavior can also be seen in RL systems that use a specific form of predictive code. This is called "successor representation"
Ve´ rtes and Sahani, 2019
Momennejad, 2020

Memory

In deep reinforcement learning, there are two main forms of memory:

  1. Long-term memory is usually the memory of an episode (episodic) stored in memory in some form.
    Wayne et al., 2018
  2. Short-term memory is stored in the neurons of the recurrent network (such as "LSTM" and "GRU").
    Stalter et al., 2020

Introducing attention mechanism into memory:
Parisotto et al., 2019

Exploration

In high-dimensional spaces, random exploration strategies are almost no longer effective.
One of the solutions is to give the agent curiosity. There are many works in this area:
Burda et al., 2019
Badia et al., 2020

Another strategy is based on uncertainty, such as choosing a strategy with lower confidence:
Osband et al., 2016

There are also some studies dedicated to allowing individuals to learn or develop their own intrinsic motivation based on task overhead:
Zheng et al., 2018

In addition, meta-learning is a brand-new way of solving exploration problems. Exploration based on meta-learning comes with some prior knowledge of the world’s rules. It is more like hypothesis-verification than pointless exploration. Experiment:
Dasgupta et al., 2019

Finally, some deep behavior analysis studies propose to solve the problem by randomly sampling in the hierarchical behavior space:
Jinnai et al., 2020
Hansen et al., 2020

Cognitive Control and Action Hierarchies

Similar to the top-level agent making abstract decisions, the bottom-level agent making more specific decisions, which needs further study:
Barreto et al., 2019 ,
Harb et al., 2018

Social Cognition

For example, in a competitive team game, how should multi-agents consider collaboration and competition.
Jaderberg et al., 2019
Berner et al., 2019

And coordination issues in cooperative games:
Foerster et al., 2019

Some difficulties and challenges in deep reinforcement learning

  1. When the task involves flexible adaptation based on structured reasoning, or the use of strong background knowledge storage, the performance of deep reinforcement learning is far inferior to humans.
  2. In a long-span task, the issue of "credit distribution" of rewards is to give a reward at the end of the task. How should the Agent update the different network parts based on past performance and the final reward? parameter?
  3. The mainstream view is that there is no mechanism for global error signal transmission such as BP in organisms. Most of the current network training is based on BP. However, in backpropagation, it is inherently difficult to retain the results of the old learning in the face of new learning.

Guess you like

Origin blog.csdn.net/weixin_40639459/article/details/109360240