Understanding of RL (reinforcement learning)-reinforcement learning

concept

RL (Reinforcement Learning)-Reinforcement learning refers to agents (agents) that learn actions to be performed in an uncertain environment by maximizing future rewards.

Features

Reinforcement learning is different from other machine learning paradigms in the following points:
1. There is no supervisor, only a reward signal (there is no supervisor, only a reward signal in RL)
2. Feedback is delayed, not instantaneous (feedback is delayed, not instantaneous) If it is delayed, the agent will not get feedback immediately, and it will take a while to know whether the result of the choice at that time is good or bad)
3. Times really matters. (sequential) Time will affect the decision result, because every one Behaviors will affect the next behavior, and the agent will eventually choose a series of behaviors that maximize the reward value, which is often used in serialized action scenarios.
4. Agents' action affect the subsequent data it receives. Because the selected behavior is different, the rewards will be different, which will result in different receipts.

classification

According to the different classification standards, RL can be divided into
1, Mode-Free RL (model-free reinforcement learning) and Mode-based RL (model-based reinforcement learning)

2. Probability-based deep learning (Policy-Based RL) and value-based deep learning (Valued-Based RL)
Probability-based deep learning algorithms: policy gradient, trust region, evolution
Value-based deep learning algorithms: TD-learning, Q-learning, SARSA
AC (Actor-Critic) algorithm: combines the advantages of policy-based and value-based methods.

3. Deep learning algorithm based on round update (Monte-Carlo update) and single-step update (Temporal-Difference update)

4. Online learning (On-Policy) and (Off-Policy) RL algorithms
Online learning (On-Policy) RL algorithms include: Sarsa, Sarsa(λ)
offline learning (Off-Policy) RL algorithms: Q Learning Deep Q Network

Guess you like

Origin blog.csdn.net/weixin_45187794/article/details/108248141