Reinforcement Learning Overview

Development and Overview of Reinforcement Learning

Reinforcement learning comes from psychological animal trial-and-error learning (doing well to get positive feedback) and optimization theory of optimal control
The Q learning model was proposed in 1989, Deep Q Network (DQN) in 2013, and reinforcement learning entered the public eye in 2015
Problems to be solved: use reinforcement learning to realize control functions, control agents to conduct actual combat, play chess and card games, etc., use reinforcement learning to optimize logistics and transportation, and autonomous driving

Basic Concepts of Reinforcement Learning

The core idea of reinforcement learning, an intelligent agent (AI) will learn from the environment by interacting with the environment and receiving rewards as feedback for performing actions
Strategy: The core of intelligence determines what decisions the agent makes, and represents what the agent does. Reinforcement learning is to make the agent’s strategy for the environment better and better. Reinforcement learning is a way of learning how to map from state to behavior, so as to The learning mechanism that maximizes the cumulative reward obtained, AI has no intuition

Components of Reinforcement Learning:

Generally include: Agent (Agent), Environment (Environmen), Status (Status), Behavior (Action), Reward (Reward Instant Reward), Policy (Policy), Value (Value) and Model (Model, corresponding to the environment) Yes, how the state changes is determined by the model, the model is similar to the inherent laws of the world, describing the changing laws of the world)
The goal of reinforcement learning is to obtain a good learning strategy. The strategy allows the agent to make the next move. Obtaining the optimal strategy based on the value function or directly learning the optimal strategy are two common ideas. The strategy of King AI It is through direct learning to obtain the policy
Everything the agent interacts with is called the environment. The environment is the external world, which includes everything except the agent. The environment can be divided into the following categories
- Deterministic and stochastic environments
- Discrete and Continuous Environments
- Fully observable environment and partially observable environment
- Multi-Agents and Multi-Agent Environments

How Reinforcement Learning Works

State, action, reward and the next overall cycle: the agent generates actions through the strategy to change the environment, the environment generates corresponding rewards and states to the agent, and the agent changes its own strategy through the state and rewards, how to proceed next One step action, cycle. The goal of reinforcement learning is to obtain a good policy. Going through circle after circle, changing my strategy based on the feedback the environment gave me.
Markov property: It means that the next state of the system is only related to the current state and has nothing to do with the previous state
Markov Decision Process: A decision process with Markov properties

Features of Reinforcement Learning:

Trial and error learning: sum up the best decision for each step through trial and error
Delayed feedback: During the training process of reinforcement learning, the "trial and error" behavior of the training object obtains feedback from the environment, and it may be necessary to wait until the entire training is over to get a feedback
Time-sequential, the training process changes over time, and the state and feedback also change over time
The current behavior affects the data received continuously

The difference between reinforcement learning and other machine learning:

Reinforcement learning does not have a universal label (supervised learning) in the training process, and the agent intelligently learns from its own experience; unsupervised learning is to discover hidden structures from unlabeled data sets, but the goal of reinforcement learning is Maximize rewards instead of finding hidden datasets

Classification of Reinforcement Learning

Classification by Valuation Method
- Based on the value function: the input is the state, the output is the size of the value function, and then the action corresponding to the maximum value function is the next action
- policy-based approach
- actor-critic approach

According to whether the model is created or not
Classified by update method
- Round update: update after a round
- Single-step update: update every step
Based on online and offline learning methods

Reinforcement Learning Overview

Guess you like