Use DDPG to train the pendulum system
This example shows how to build a pendulum model and train it with DDPG.
For model loading, refer to my previous blog post using DQN .
Open the model and create the environment interface
Open the model
mdl = 'rlSimplePendulumModel';
open_system(mdl)
Create a predefined environment interface for the pendulum.
env = rlPredefinedEnv('SimplePendulumModel-Continuous')
The interface has a discrete operating space in which the agent can apply one of three possible torque values to the pendulum: -2, 0, or 2 N·m.
To define the initial condition of the pendulum as hanging downward, use the anonymous function handle to specify the environment reset function. This reset function sets the model workspace variable thetaθ \thetaθ to pi.
numObs = 3;
set_param('rlSimplePendulumModel/create observations','ThetaObservationHandling','sincos');
To define the initial condition of the pendulum as hanging downward, use the anonymous function handle to specify the environment reset function. This reset function sets theta0 of the model workspace variable to pi.
env.ResetFcn = @(in)setVariable(in,'theta0',pi,'Workspace',mdl);
Specify the simulation time Tf and the proxy sampling time Ts in seconds.
Ts = 0.05;
Tf = 20;
Fixed random generator seed to improve repeatability.
rng(0)
Create DDPG agent
The DDPG agent uses the commenter value function notation to estimate long-term rewards based on given observations and operations. To create a commenter, you first need to create a deep neural network with two inputs (state and action) and one output. For more information on creating deep neural network value function representations, see Creating Strategy and Value Function Representations .
statePath = [
featureInputLayer(numObs,'Normalization','none','Name','observation')
fullyConnectedLayer(400,'Name','CriticStateFC1')
reluLayer('Name', 'CriticRelu1')
fullyConnectedLayer(300,'Name','CriticStateFC2')];
actionPath = [
featureInputLayer(1,'Normalization','none','Name','action')
fullyConnectedLayer(300,'Name','CriticActionFC1','BiasLearnRateFactor',0)];
commonPath = [
additionLayer(2,'Name','add')
reluLayer('Name','CriticCommonRelu')
fullyConnectedLayer(1,'Name','CriticOutput')];
criticNetwork = layerGraph();
criticNetwork = addLayers(criticNetwork,statePath);
criticNetwork = addLayers(criticNetwork,actionPath);
criticNetwork = addLayers(criticNetwork,commonPath);
criticNetwork = connectLayers(criticNetwork,'CriticStateFC2','add/in1');
criticNetwork = connectLayers(criticNetwork,'CriticActionFC1','add/in2');
Check the commenter's network configuration.
figure
plot(criticNetwork)
Use rlRepresentationOptions to specify the options represented by the reviewer.
criticOpts = rlRepresentationOptions('LearnRate',1e-03,'GradientThreshold',1);
Create a commenter representation using the specified deep neural network and options. You must also specify the reviewer's operation and observation information, which is obtained from the environment interface. For more information, see rlQValueRepresentation .
obsInfo = getObservationInfo(env);
actInfo = getActionInfo(env);
critic = rlQValueRepresentation(criticNetwork,obsInfo,actInfo,'Observation',{
'observation'},'Action',{
'action'},criticOpts);
To create a deep neural network with one input (observation) and one output (action).
Construct actors in a manner similar to commenters. For more information, see rlDeterministicActorRepresentation .
actorNetwork = [
featureInputLayer(numObs,'Normalization','none','Name','observation')
fullyConnectedLayer(400,'Name','ActorFC1')
reluLayer('Name','ActorRelu1')
fullyConnectedLayer(300,'Name','ActorFC2')
reluLayer('Name','ActorRelu2')
fullyConnectedLayer(1,'Name','ActorFC3')
tanhLayer('Name','ActorTanh')
scalingLayer('Name','ActorScaling','Scale',max(actInfo.UpperLimit))];
actorOpts = rlRepresentationOptions('LearnRate',1e-04,'GradientThreshold',1);
actor = rlDeterministicActorRepresentation(actorNetwork,obsInfo,actInfo,'Observation',{
'observation'},'Action',{
'ActorScaling'},actorOpts);
To create a DDPG agent, first use rlDDPGAgentOptions to specify DDPG agent options.
agentOpts = rlDDPGAgentOptions(...
'SampleTime',Ts,...
'TargetSmoothFactor',1e-3,...
'ExperienceBufferLength',1e6,...
'DiscountFactor',0.99,...
'MiniBatchSize',128);
agentOpts.NoiseOptions.Variance = 0.6;
agentOpts.NoiseOptions.VarianceDecayRate = 1e-5;
Then use the specified actor representation, commenter representation and agent options to create a DDPG agent. For more information, see rlDDPGAgent .
agent = rlDDPGAgent(actor,critic,agentOpts);
Training agent
To train the agent, first specify the training options. For this example, use the following options.
-
Perform training up to 50,000 times, and each episode lasts the longest time step of ceil (Tf/Ts).
-
Display the training progress in the "Plot Manager" dialog box (set the Plots option), and disable the command line display (set the Verbose option to false).
-
When the average cumulative reward obtained by the agent for five consecutive episodes is greater than –740, please stop training. At this point, the agent can quickly put the pendulum in an upright position with minimal control force.
-
Save a copy of the agent for each episode where the cumulative reward is greater than –740.
For more information, see rlTrainingOptions .
maxepisodes = 5000;
maxsteps = ceil(Tf/Ts);
trainOpts = rlTrainingOptions(...
'MaxEpisodes',maxepisodes,...
'MaxStepsPerEpisode',maxsteps,...
'ScoreAveragingWindowLength',5,...
'Verbose',false,...
'Plots','training-progress',...
'StopTrainingCriteria','AverageReward',...
'StopTrainingValue',-740,...
'SaveAgentCriteria','EpisodeReward',...
'SaveAgentValue',-740);
Use the training function to train the agent. Training this agent is a computationally intensive process that takes several minutes to complete. To save time running this example, please load the pre-trained agent by setting doTraining to false. To train the agent yourself, set doTraining to true.
doTraining = false;
if doTraining
% Train the agent.
trainingStats = train(agent,env,trainOpts);
else
% Load the pretrained agent for the example.
load('SimulinkPendulumDDPG.mat','agent')
end
DDPG simulation
To verify the performance of the trained agent, simulate it in a pendulum environment. For more information about agent simulation, see rlSimulationOptions and sim .
simOptions = rlSimulationOptions('MaxSteps',500);
experience = sim(env,agent,simOptions);