Reinforcement Learning - Getting Started

Reinforcement Learning:

As a reinforcement learning inspired by theories of the behavioral psychology disciplines that involve a probability theory, statistics, approximation theory, convex analysis, computational complexity theory, operations research and large multi-disciplinary knowledge, the difficulty of the threshold high, leading to the slow pace of development in particular.

One explanation:

In fact, human life is constantly in reinforcement learning, when you have action (action) (state) in the execution of a state, then you get feedback (reward), try a variety of actions numerous times in various states after these dots Markov model of the brain, the behavior after you know what is best.

Another explanation:

Strengthening the most important concepts to learn: agent, environment, reward, policy, action. Markov process environment generally described, agent generates action, interact, and environment take certain policy, to produce a reward. After the agent to adjust and optimize the current policy based on reward.


Difference reinforcement learning and supervised learning

  1. There is a supervised learning the training samples with the label, no label and reinforcement learning, it is done to the environment by learning the rewards and punishments; we will find reinforcement learning and supervised learning and unsupervised learning biggest difference is that does not require large amounts of data to "feed", but to learn certain skills through their own non-stop to try;

  2. Supervised learning of the learning process is static, and reinforcement learning process of learning is dynamic; dynamic here and static means: whether and environment interact; supervised learning to what sample school what, and reinforcement learning to the environment interact, and then to learn to interact based on rewards and punishments given;

  3. Supervised learning solutions are more aware of issues such as the depth of learning, and reinforcement learning to solve the main problem is the decision; therefore supervised learning such as facial features, and reinforcement learning is more like the brain; For chestnut: the face of a tiger when, if only supervised learning will reflect tiger word, if you have a reinforcement learning, we can decide whether to flee or fight.

    As long as the issue is included in the decision-making and control, you can use reinforcement learning!


Strengthen mainstream learning algorithm

Free learning model (Model-Free) VS models have to learn (Model-Base)

The main difference: the agent can fully understand whether or learn to model the environment where

  • There are models of learning: the advantage for the environment ahead of awareness, consider and plan ahead; drawback is that if the model and the real world are inconsistent, then not have very good performance under realistic scenarios;

  • Model-free learning: learning model drawback is abandoned, so that less efficient than the former; the advantage is more easily achieved in this way, it is easy to adjust to a good state at the real scene.

    Therefore, the model-free learning more popular and more widely support and research.


Why reinforcement learning will be very slow?

  • Incremental update parameters: initial algorithm, AI between the input operation from the surrounding environment, to the output, is accomplished by gradient descent mapping. In this process, each increment will need to be very small, they are not to let the new information learned, the learned experience covers before (this is called "catastrophic interference"). In this way, the learning process will be very slow.

    解决方法: The plot of the depth of reinforcement learning (Episodic DRL)

  • Weak inductive bias: any learning process, should face a "bias - variance trade-off." The so-called bias, is a good start limited number of possible results, AI from the inside to find out what kind you want. The narrower defined, the AI can consider only a small number of possibilities, the result faster. Weak inductive bias, we need to consider more possibilities to learn it slower.

    解决方法:Well first of all define a narrow range, let the AI ​​to explore. You may know how should be limited to where? The answer is, learn from past experience.

    Note: explain in detail by reference answer qubit


Strengthen the combination of learning and depth of learning (DRL, Deep Reinforcement Learning)

Breaking the previous use of reinforcement learning process the data from the fundamental model selection algorithm, training and testing this way of thinking to solve the problem, but from the policy, the value of the function, model perspectives to resolve the problem. To be able to perform general-purpose mathematical expression to sequence the decision problem is a typical Markov decision process is widely used. In addition, dynamic programming, Monte Carlo, timing control three methods as the best strategy to explore Markov sequence is widely used in important ways, and from the perspective of control to teach how the agent to explore and exploit in a limited state .

On the basis of the above, the policy and the gradient of the neural network is widely applied to the approximation process and policy-valued function. Using a neural network approach to some extent, to avoid large sequence table storage space, slow queries stifling criticism, has become a new direction of development of reinforcement learning.

Typically, human learning is in a real environment, but reinforcement learning is still not spread to highly complex, logical reasoning and sentiment analysis stage, it has a simulation environment is to strengthen an important foundation for learning to learn, which is the DRL difference at other points unique AI algorithms. It can be said reinforcement learning success comes from its success in the field of the game, because the game involves only the policy decisions, without the need for complex logic reasoning (Go Lazi calculated probability), and sentiment analysis.

  • Learning the basics depth study are: data, algorithmic models and calculation;
  • Learn basic depth of reinforcement learning are: simulation environment, algorithmic models and computing power

In reinforcement learning the principles and basis for strengthening learning and simulation environment to strengthen the depth learning algorithm on the basis of the real depth of the reinforcement learning algorithm is what? How it is the training ground? How to determine model parameters and how to adjust over a series of issues follow. DQN (Deep Q Network)DRL is a landmark algorithm.


References:

[1] strengthen mainstream learning algorithm

[2] What enhanced learning? - qubit answer - almost known

[3] What enhanced learning? - Chen Wei Hai answer - know almost

[4] depth of reinforcement learning (DRL) to explore - DRLearner article - know almost

Guess you like

Origin www.cnblogs.com/xxxxxxxxx/p/11511051.html
Recommended