Train the DDPG agent to control the dual integrator system
This example illustrates how to train a Deep Deterministic Policy Gradient (DDPG) agent to control a second-order dynamic system modeled on MATLAB®.
For more information about the DDPG agent, see Deep Deterministic Strategy Gradient Agent . For an example showing how to train a DDPG agent in Simulink®, see Training a DDPG agent balance pendulum .
MATLAB environment of double integrator
The reinforcement learning environment for this example is a second-order dual integrator system with gain. The training goal is to control the position of the medium in the second-order system by applying force input.
For this environment:
- Start from an initial position between 4 and 4 units.
- The force signal from the medium to the environment is 2 to 2N.
- What is observed from the environment is the position and velocity of the mass.
- If the mass moves more than 5 meters from its original position or if x <0.01, the episode terminates.
- The reward r(t) provided at each step is the discretization of r(t)
Here:
-
x is the state vector of quality.
-
u is the force applied to the medium.
-
Q is the weight of control performance; Q = [10 0; 0 1].
-
R is the weight of the control effect; R = 0.01.
Create environment interface
Create a predefined environment interface for the dual integrator system.
env = rlPredefinedEnv("DoubleIntegrator-Continuous")
env.MaxForce = Inf;
The interface has a continuous action space in which the agent can apply a force value from -Inf to Inf to the medium.
Obtain observation and action information from the environment interface.
obsInfo = getObservationInfo(env);
numObservations = obsInfo.Dimension(1);
actInfo = getActionInfo(env);
numActions = numel(actInfo);
The reproducibility of the fixed random generator seed.
rng(0)
Create DDPG agent
The DDPG agent uses the commenter value function notation to estimate long-term rewards based on given observations and operations. To create a commenter, you first need to create a deep neural network with two inputs (state and action) and one output. For more information on creating neural network value function representations, see Creating Strategy and Value Function Representations .
statePath = imageInputLayer([numObservations 1 1],'Normalization','none','Name','state');
actionPath = imageInputLayer([numActions 1 1],'Normalization','none','Name','action');
commonPath = [concatenationLayer(1,2,'Name','concat')
quadraticLayer('Name','quadratic')
fullyConnectedLayer(1,'Name','StateValue','BiasLearnRateFactor',0,'Bias',0)];
criticNetwork = layerGraph(statePath);
criticNetwork = addLayers(criticNetwork,actionPath);
criticNetwork = addLayers(criticNetwork,commonPath);
criticNetwork = connectLayers(criticNetwork,'state','concat/in1');
criticNetwork = connectLayers(criticNetwork,'action','concat/in2');
Check the commenter's network configuration.
figure
plot(criticNetwork)
Use rlRepresentationOptions to specify the options represented by the reviewer.
criticOpts = rlRepresentationOptions('LearnRate',5e-3,'GradientThreshold',1);
Create a commenter representation using the specified neural network and options. You must also specify the reviewer's operation and observation information, which is obtained from the environment interface. For more information, see rlQValueRepresentation .
critic = rlQValueRepresentation(criticNetwork,obsInfo,actInfo,'Observation',{
'state'},'Action',{
'action'},criticOpts);
The DDPG agent uses the actor representation to decide the action to take (in the given observation). To create an actor, you must first create a deep neural network with one input (observation) and one output (action).
Construct actors in a manner similar to commenters.
actorNetwork = [
imageInputLayer([numObservations 1 1],'Normalization','none','Name','state')
fullyConnectedLayer(numActions,'Name','action','BiasLearnRateFactor',0,'Bias',0)];
actorOpts = rlRepresentationOptions('LearnRate',1e-04,'GradientThreshold',1);
actor = rlDeterministicActorRepresentation(actorNetwork,obsInfo,actInfo,'Observation',{
'state'},'Action',{
'action'},actorOpts);
To create a DDPG agent, first use rlDDPGAgentOptions to specify the DDPG agent options.
agentOpts = rlDDPGAgentOptions(...
'SampleTime',env.Ts,...
'TargetSmoothFactor',1e-3,...
'ExperienceBufferLength',1e6,...
'DiscountFactor',0.99,...
'MiniBatchSize',32);
agentOpts.NoiseOptions.Variance = 0.3;
agentOpts.NoiseOptions.VarianceDecayRate = 1e-6;
Use the designated commenter representation, commenter representation and agent options to create a DDPG agent. For more information, see rlDDPGAgent .
agent = rlDDPGAgent(actor,critic,agentOpts);
Training agent
To train the agent, first specify the training options. For this example, use the following options.
-
Run up to 1000 episodes in the training session, and each episode lasts up to 200 time steps.
-
Display the training progress in the "Plot Manager" dialog box (set the "Plots" option), and disable the command line display (set the "Verbose" option).
-
When the moving average cumulative reward received by the agent is greater than –66, please stop training. At this point, the agent can control the position of the mass with the least control force.
For more information, see rlTrainingOptions .
trainOpts = rlTrainingOptions(...
'MaxEpisodes', 5000, ...
'MaxStepsPerEpisode', 200, ...
'Verbose', false, ...
'Plots','training-progress',...
'StopTrainingCriteria','AverageReward',...
'StopTrainingValue',-66);
You can use the drawing function to visualize the dual integrator environment during training or simulation.
plot(env)
Use the training function to train the agent. Training this agent is a computationally intensive process that takes a lot of time to complete. To save time running this example, please load the pre-trained agent by setting doTraining to false . To train the agent yourself, set doTraining to true.
doTraining = false;
if doTraining
% Train the agent.
trainingStats = train(agent,env,trainOpts);
else
% Load the pretrained agent for the example.
load('DoubleIntegDDPG.mat','agent');
end
DDPG agent simulation
To verify the performance of the trained agent, simulate it in a dual integrator environment. For more information about agent simulation, see rlSimulationOptions and sim .
simOptions = rlSimulationOptions('MaxSteps',500);
experience = sim(env,agent,simOptions);
totalReward = sum(experience.Reward)