Reinforcement Learning Overview
Artificial Intelligence II Learning Content:
Reference materials:
Class lecturer's explanation and courseware are mainly used, and I like pictures.
Reference network information:
[Don't bother with Python] Reinforcement Learning
[Wang Shusen] Deep Reinforcement Learning (DRL)
Li Hongyi Course-Reinforcement Learning
deep-rl-course
(There are a lot of information, if you can't read it all, choose the best, think first and then search if you have any questions)
Reinforcement Learning Essence
Learning strategies in the interaction with the environment .
Reinforcement Learning Concepts
The intelligent subject (agent) chooses the appropriate action (action) according to the current state (state)
according to a certain strategy (policy) . The intelligent agent exerts influence on the environment through actions . Agent: The initiator of actions in the system. State: An interpretation of the environment by the agent . Action: It reflects the influence of the intelligent agent on the subjective initiative of the environment , and the benefits brought by the action are called rewards . The purpose of reinforcement learning is to obtain as high a reward as possible. Policy: Make decisions based on the observed state and control the movement of the agent. The intelligent agent may or may not know the laws of environmental changes. The part other than the intelligent agent in the system feeds back the status and rewards to the intelligent agent and changes according to certain rules. a chestnut: a second chestnut
Sources of Randomness in Reinforcement Learning
Sources of randomness: ① Action: given a state, the action is randomly sampled according to the policy function π; ② state transition state translation: the next state is randomly sampled by the state and the state transition function.
Characteristics of Reinforcement Learning (Comparison)
Comparing supervised learning, unsupervised learning and reinforcement learning:
Supervised learning: There are "labels", and the supervised algorithm can continuously adjust the model to obtain the mapping function of input and output. Based on supervised learning, given once. Single step decision. The goal is to map samples to semantic labels.
Unsupervised learning: No "label", modeling by analyzing the data itself, discovering underlying information and hidden structures. Based on the assumption of the data structure, it is given once. No decision. Similar data distribution patterns.
Online learning: no "labels", based on evaluation (evaluative), accept new data, update parameters. Data is produced interactively. Sequential decision process. The goal is to obtain the mapping for the maximum benefit.
Contrasting reinforcement learning and supervised learning:
(1) There is no label in the training data , only the reward function (Reward Function).
(2) The training data is not given ready-made , but obtained by actions (Action) .
(3) The current behavior (Action) not only affects the acquisition of subsequent training data , but also affects the value of the reward function (Reward Function) . (4) The purpose of training is to construct a function of
" state -> behavior ", in which the state (State) describes the current internal and external environment , in this case, to make an agent (Agent) in a specific state,Through this function, decide the behavior that should be taken at this time . It is hoped that after taking these actions, the maximum reward function value will be obtained eventually .
Some common reinforcement learning algorithms
Classification by whether the environment is understood:
classification by classification basis:
update method classification:
online or offline:
Next article: Discrete Markov Process (Discrete Markov Process) - Reinforcement Learning as a Training Learning Method