MATLAB Reinforcement Learning Toolbox (12) Overview of the creation of reinforcement learning agents

Introduction to Reinforcement Learning

The goal of reinforcement learning is to train agents to complete tasks in uncertain environments . The agent receives observations and rewards from the environment, and sends operations to the environment. Reward is a measure of the success of an action relative to the completion of the task goal. The agent contains two components: strategy and learning algorithm .

  1. The strategy is a mapping, which selects actions based on observations of the environment. Usually, the strategy is a function approximator with adjustable parameters, such as a deep neural network.

  2. The learning algorithm continuously updates the strategy parameters based on actions, observations and rewards. The goal of the learning algorithm is to find the best strategy that maximizes the expected cumulative long-term rewards received during the task.

Insert picture description here
The agent maintains one or more parameterized function approximators according to the learning algorithm to train the strategy. The approximator can be used in two ways.

  1. Reviewer -For a given observation and action, the reviewer returns the expected value of the task’s cumulative long-term reward as the output.

  2. Participant -For a given observation, the participant returns the action that maximizes the expected cumulative long-term reward as output.

Agents that only use critics to choose their behavior rely on indirect strategy representation. These agents are also called value-based agents, and they use an approximator to represent a value function or Q-value function. Generally, these agents can work better in discrete operating spaces, but for continuous operating spaces, it may become computationally expensive.

Agents that only use actors to choose their actions rely on direct strategy representation. These agents are also called policy-based. The strategy can be deterministic or random. Generally, although training algorithms are sensitive to noise measurements and can converge to local minimums, these agents are simpler and can handle continuous action spaces.

Agents that use actors and critics at the same time are called actor-critic agents. In these agents, the actor will use the commenter's feedback during the training process (rather than directly using the reward) to learn the best action to take. At the same time, the critic learns the value function from the reward, so that the actor can be properly criticized. Generally, these agents can handle discrete and continuous action spaces.

Built-in agent

The Reinforcement Learning Toolbox™ software provides the following built-in agents. You can train these agents in an environment with continuous or discrete observation space and the following operating space.

The following table summarizes the types, operating spaces and representations of all built-in agents. For each agent, the observation space can be discrete or continuous.

Built-in agent: type and action space

Insert picture description here
Built-in agents: the representation that each agent must use. Agents
Insert picture description here
with a default network—except for Q-Learning and SARSA, all agents support the default network for actors and commenters. You can create agents based on observations and operating specifications in the environment, using the default representation of actors and commenters. To do this, follow the steps below.

  1. Create observation specifications for your environment. If you already have an environment interface object, you can use getObservationInfo to obtain these specifications.

  2. Create operating specifications for your environment. If you already have environment interface objects, you can use getActionInfo to obtain these specifications.

  3. If necessary, specify the number of neurons in each learnable layer or whether to use an LSTM layer. To do this, use rlAgentInitializationOptions to create an agent initialization options object.

  4. If needed, you can specify agent options by creating an option object set for a specific agent.

  5. Use the corresponding agent creation function to create an agent. The generated agent contains the appropriate actors and commenters listed in the table above. Actors and commenters use default agent-specific deep neural networks as internal approximators.

For more information on creating roles and commenting function approximators, see Creating Strategy and Value Function Representations .

Choose agent type

When choosing an agent, the best practice is to start with a simpler (and faster training) algorithm that is compatible with your action and observation space. If the simpler algorithm does not perform as expected, you can gradually try the more complex algorithm.

Discrete action and observation space —For environments with discrete action and observation space, Q learning agent is the simplest compatible agent, followed by DQN and PPO.
Insert picture description here
Discrete action space and continuous observation space —For environments with discrete action space and continuous observation space, DQN is the simplest compatible agent following PPO.
Insert picture description here

Continuous action space —For an environment with continuous action and observation space at the same time, DDPG is the simplest compatible agent, followed by TD3, PPO and SAC. For this type of environment, please try DDPG first. Generally speaking:

  1. TD3 is an improved and more complex version of DDPG.

  2. PPO has more stable updates, but requires more training.

  3. SAC is an improvement of DDPG, a more complex version that can generate random strategies.

Insert picture description here

Custom agent

You can also use other learning algorithms to train strategies by creating a custom agent. To do this, you will create a subclass of a custom agent class and use a set of required and optional methods to define agent behavior. For more information, see Customizing Agents .

More about reinforcement learning

https://ww2.mathworks.cn/help/releases/R2020b/reinforcement-learning/ug/what-is-reinforcement-learning.html

Guess you like

Origin blog.csdn.net/wangyifan123456zz/article/details/109539445