[Artificial Intelligence II Notes] There are many concepts of reinforcement learning for beginners

Artificial Intelligence II Learning Content:
insert image description here

Reference materials:
Class lecturer's explanation and courseware are mainly used, and I like pictures.
Reference network information:
[Don't bother with Python] Reinforcement Learning
[Wang Shusen] Deep Reinforcement Learning (DRL)
Li Hongyi Course-Reinforcement Learning
deep-rl-course
(There are a lot of information, if you can't read it all, choose the best, think first and then search if you have any questions)

Reinforcement Learning Essence

Learning strategies in the interaction with the environment .
insert image description here

Reinforcement Learning Concepts

insert image description here
insert image description here
The intelligent subject (agent) chooses the appropriate action (action) according to the current state (state) 
according to a certain strategy (policy) . The intelligent agent exerts influence on the environment through actions . Agent: The initiator of actions in the system. State: An interpretation of the environment by the agent . Action: It reflects the influence of the intelligent agent on the subjective initiative of the environment , and the benefits brought by the action are called rewards . The purpose of reinforcement learning is to obtain as high a reward as possible. Policy: Make decisions based on the observed state and control the movement of the agent. The intelligent agent may or may not know the laws of environmental changes. The part other than the intelligent agent in the system feeds back the status and rewards to the intelligent agent and changes according to certain rules. a chestnut: a second chestnut



insert image description here

insert image description here







Sources of Randomness in Reinforcement Learning

Sources of randomness: ① Action: given a state, the action is randomly sampled according to the policy function π; ② state transition state translation: the next state is randomly sampled by the state and the state transition function.

Characteristics of Reinforcement Learning (Comparison)

Comparing supervised learning, unsupervised learning and reinforcement learning:

Supervised learning: There are "labels", and the supervised algorithm can continuously adjust the model to obtain the mapping function of input and output. Based on supervised learning, given once. Single step decision. The goal is to map samples to semantic labels.
Unsupervised learning: No "label", modeling by analyzing the data itself, discovering underlying information and hidden structures. Based on the assumption of the data structure, it is given once. No decision. Similar data distribution patterns.
Online learning: no "labels", based on evaluation (evaluative), accept new data, update parameters. Data is produced interactively. Sequential decision process. The goal is to obtain the mapping for the maximum benefit.

insert image description here
Contrasting reinforcement learning and supervised learning:
(1) There is no label in the training data , only the reward function (Reward Function).
(2) The training data is not given ready-made , but obtained by actions (Action) .
(3) The current behavior (Action) not only affects the acquisition of subsequent training data , but also affects the value of the reward function (Reward Function) . (4) The purpose of training is to construct a function of
" state -> behavior ", in which the state (State) describes the current internal and external environment , in this case, to make an agent (Agent) in a specific state,Through this function, decide the behavior that should be taken at this time . It is hoped that after taking these actions, the maximum reward function value will be obtained eventually .

Some common reinforcement learning algorithms

insert image description here
Classification by whether the environment is understood:
insert image description here
classification by classification basis:
insert image description here
update method classification:
insert image description here
online or offline:
insert image description here


Next article: Discrete Markov Process (Discrete Markov Process) - Reinforcement Learning as a Training Learning Method
insert image description here
insert image description here

Guess you like

Origin blog.csdn.net/qq_45973306/article/details/123299519