MATLAB Reinforcement Learning Toolbox (7) Pendulum model modeling and DQN training


This example shows how to build a pendulum model and train it with a deep Q learning network (DQN).

Pendulum model

Insert picture description here

The reinforcement learning environment for this example is a simple frictionless pendulum, initially hung in a downward position. The goal of training is to use minimal control effort to make the pendulum stand straight without falling.

Open the model

mdl = 'rlSimplePendulumModel';
open_system(mdl)

Insert picture description here
For this model:

  1. The upward position of the balance pendulum is 0 radians, and the downward suspension position is π \piπ radians.
  2. The torque action signal from the agent to the environment is –2 to 2 N·m.
  3. Observed from the environment is the sine of the pendulum angle, the cosine of the pendulum angle, and the derivative of the pendulum angle.
  4. Every step will provide reward rt r_trt

Insert picture description here
it's here:

  1. θ t \theta_t θtIs the displacement angle from the upright position.
  2. θ t ˙ \dot{\theta_t} θt˙Is the derivative of the displacement angle.
  3. u t − 1 u_{t-1} ut1It is the control work of the previous time step.

Create environment interface

Create a predefined environment interface for the pendulum.

env = rlPredefinedEnv('SimplePendulumModel-Discrete')

Insert picture description here
The interface has a discrete operating space in which the agent can apply one of three possible torque values ​​to the pendulum: -2, 0, or 2 N·m.
To define the initial condition of the pendulum as hanging downward, use the anonymous function handle to specify the environment reset function. This reset function will change the model workspace variable theta θ \thetaθ is set to pi.

env.ResetFcn = @(in)setVariable(in,'theta0',pi,'Workspace',mdl);

Obtain observation and action specification information from the environment

obsInfo = getObservationInfo(env)

Insert picture description here

actInfo = getActionInfo(env)

Insert picture description here
Specify the simulation time Tf and the agent sampling time Ts in seconds.

Ts = 0.05;
Tf = 20;

Fixed random generator seed to improve repeatability.

rng(0)

Create DQN agent

The DQN agent uses a value function as a commenter to approximate long-term rewards based on observations and actions.

Since DQN has a discrete action space, it can rely on a multi-output critic approximator, which is usually more effective than relying on a comparable single-output approximator. The multi-output approximator only takes observations as input, while the output vector has as many elements as possible discrete operands. When the corresponding discrete operation is taken, each output element represents the expected long-term cumulative reward derived from the input observation.

To create a commenter, first create a deep neural network with a three-element input vector (sine, cosine, and derivative for the swing angle), and an output vector with three elements (–2, 0, or 2 Nm action). For more information on creating deep neural network value function representations, see Creating Strategy and Value Function Representations .

dnn = [
    featureInputLayer(3,'Normalization','none','Name','state')
    fullyConnectedLayer(24,'Name','CriticStateFC1')
    reluLayer('Name','CriticRelu1')
    fullyConnectedLayer(48,'Name','CriticStateFC2')
    reluLayer('Name','CriticCommonRelu')
    fullyConnectedLayer(3,'Name','output')];

Check the commenter's network configuration.

figure
plot(layerGraph(dnn))

Insert picture description here
Use rlRepresentationOptions to specify the options represented by the reviewer.

criticOpts = rlRepresentationOptions('LearnRate',0.001,'GradientThreshold',1);

Create a commenter representation using the specified deep neural network and options. You must also specify observation and action information for the reviewer. For more information, see rlQValueRepresentation .

critic = rlQValueRepresentation(dnn,obsInfo,actInfo,'Observation',{
    
    'state'},criticOpts);

To create a DQN agent, first use rlDQNAgentOptions to specify DQN agent options.

agentOptions = rlDQNAgentOptions(...
    'SampleTime',Ts,...
    'TargetSmoothFactor',1e-3,...
    'ExperienceBufferLength',3000,... 
    'UseDoubleDQN',false,...
    'DiscountFactor',0.9,...
    'MiniBatchSize',64);

Then, use the specified commenter representation and agent options to create a DQN agent. For more information, see rlDQNAgent .

agent = rlDQNAgent(critic,agentOptions);

Training agent

To train the agent, first specify the training options. For this example, use the following options.

  1. A maximum of 1000 episodes can be performed per training, and each episode can last up to 500 times.

  2. Display the training progress in the "Plot Manager" dialog box (set the Plots option), and disable the command line display (set the Verbose option to false).

  3. When the average cumulative reward obtained by the agent for five consecutive episodes is greater than –1100, please stop training. At this point, the agent can quickly put the pendulum in an upright position with minimal control force.

  4. Save a copy of the agent for each episode with a cumulative reward greater than –1100.

trainingOptions = rlTrainingOptions(...
    'MaxEpisodes',1000,...
    'MaxStepsPerEpisode',500,...
    'ScoreAveragingWindowLength',5,...
    'Verbose',false,...
    'Plots','training-progress',...
    'StopTrainingCriteria','AverageReward',...
    'StopTrainingValue',-1100,...
    'SaveAgentCriteria','EpisodeReward',...
    'SaveAgentValue',-1100);

Use the training function to train the agent. Training this agent is a computationally intensive process that takes several minutes to complete. To save time running this example, please load the pre-trained agent by setting doTraining to false. To train the agent yourself, set doTraining to true.

doTraining = false;

if doTraining
    % Train the agent.
    trainingStats = train(agent,env,trainingOptions);
else
    % Load the pretrained agent for the example.
    load('SimulinkPendulumDQNMulti.mat','agent');
end

Insert picture description here

Agent simulation

To verify the performance of a well-trained agent, please simulate it in a pendulum environment. For more information about agent simulation, see rlSimulationOptions and sim .

simOptions = rlSimulationOptions('MaxSteps',500);
experience = sim(env,agent,simOptions);

Insert picture description here

Guess you like

Origin blog.csdn.net/wangyifan123456zz/article/details/109494962