MATLAB reinforcement learning combat (four) training DDPG intelligent body to control the dual integrator system

This example illustrates how to train a Deep Deterministic Policy Gradient (DDPG) agent to control a second-order dynamic system modeled on MATLAB®.

For more information about the DDPG agent, see Deep Deterministic Strategy Gradient Agent . For an example showing how to train a DDPG agent in Simulink®, see Training a DDPG agent balance pendulum .

MATLAB environment of double integrator

The reinforcement learning environment for this example is a second-order dual integrator system with gain. The training goal is to control the position of the medium in the second-order system by applying force input.
Insert picture description here
For this environment:

  1. Start from an initial position between 4 and 4 units.
  2. The force signal from the medium to the environment is 2 to 2N.
  3. What is observed from the environment is the position and velocity of the mass.
  4. If the mass moves more than 5 meters from its original position or if x <0.01, the episode terminates.
  5. The reward r(t) provided at each step is the discretization of r(t)
    Insert picture description here

Here:

  1. x is the state vector of quality.

  2. u is the force applied to the medium.

  3. Q is the weight of control performance; Q = [10 0; 0 1].

  4. R is the weight of the control effect; R = 0.01.

Create environment interface

Create a predefined environment interface for the dual integrator system.

env = rlPredefinedEnv("DoubleIntegrator-Continuous")

Insert picture description here

env.MaxForce = Inf;

The interface has a continuous action space in which the agent can apply a force value from -Inf to Inf to the medium.

Obtain observation and action information from the environment interface.

obsInfo = getObservationInfo(env);
numObservations = obsInfo.Dimension(1);
actInfo = getActionInfo(env);
numActions = numel(actInfo);

The reproducibility of the fixed random generator seed.

rng(0)

Create DDPG agent

The DDPG agent uses the commenter value function notation to estimate long-term rewards based on given observations and operations. To create a commenter, you first need to create a deep neural network with two inputs (state and action) and one output. For more information on creating neural network value function representations, see Creating Strategy and Value Function Representations .

statePath = imageInputLayer([numObservations 1 1],'Normalization','none','Name','state');
actionPath = imageInputLayer([numActions 1 1],'Normalization','none','Name','action');
commonPath = [concatenationLayer(1,2,'Name','concat')
             quadraticLayer('Name','quadratic')
             fullyConnectedLayer(1,'Name','StateValue','BiasLearnRateFactor',0,'Bias',0)];

criticNetwork = layerGraph(statePath);
criticNetwork = addLayers(criticNetwork,actionPath);
criticNetwork = addLayers(criticNetwork,commonPath);

criticNetwork = connectLayers(criticNetwork,'state','concat/in1');
criticNetwork = connectLayers(criticNetwork,'action','concat/in2');

Check the commenter's network configuration.

figure
plot(criticNetwork)

Insert picture description here
Use rlRepresentationOptions to specify the options represented by the reviewer.

criticOpts = rlRepresentationOptions('LearnRate',5e-3,'GradientThreshold',1);

Create a commenter representation using the specified neural network and options. You must also specify the reviewer's operation and observation information, which is obtained from the environment interface. For more information, see rlQValueRepresentation .

critic = rlQValueRepresentation(criticNetwork,obsInfo,actInfo,'Observation',{
    
    'state'},'Action',{
    
    'action'},criticOpts);

The DDPG agent uses the actor representation to decide the action to take (in the given observation). To create an actor, you must first create a deep neural network with one input (observation) and one output (action).

Construct actors in a manner similar to commenters.

actorNetwork = [
    imageInputLayer([numObservations 1 1],'Normalization','none','Name','state')
    fullyConnectedLayer(numActions,'Name','action','BiasLearnRateFactor',0,'Bias',0)];

actorOpts = rlRepresentationOptions('LearnRate',1e-04,'GradientThreshold',1);

actor = rlDeterministicActorRepresentation(actorNetwork,obsInfo,actInfo,'Observation',{
    
    'state'},'Action',{
    
    'action'},actorOpts);

To create a DDPG agent, first use rlDDPGAgentOptions to specify the DDPG agent options.

agentOpts = rlDDPGAgentOptions(...
    'SampleTime',env.Ts,...
    'TargetSmoothFactor',1e-3,...
    'ExperienceBufferLength',1e6,...
    'DiscountFactor',0.99,...
    'MiniBatchSize',32);
agentOpts.NoiseOptions.Variance = 0.3;
agentOpts.NoiseOptions.VarianceDecayRate = 1e-6;

Use the designated commenter representation, commenter representation and agent options to create a DDPG agent. For more information, see rlDDPGAgent .

agent = rlDDPGAgent(actor,critic,agentOpts);

Training agent

To train the agent, first specify the training options. For this example, use the following options.

  1. Run up to 1000 episodes in the training session, and each episode lasts up to 200 time steps.

  2. Display the training progress in the "Plot Manager" dialog box (set the "Plots" option), and disable the command line display (set the "Verbose" option).

  3. When the moving average cumulative reward received by the agent is greater than –66, please stop training. At this point, the agent can control the position of the mass with the least control force.

For more information, see rlTrainingOptions .

trainOpts = rlTrainingOptions(...
    'MaxEpisodes', 5000, ...
    'MaxStepsPerEpisode', 200, ...
    'Verbose', false, ...
    'Plots','training-progress',...
    'StopTrainingCriteria','AverageReward',...
    'StopTrainingValue',-66);

You can use the drawing function to visualize the dual integrator environment during training or simulation.

plot(env)

Insert picture description here
Use the training function to train the agent. Training this agent is a computationally intensive process that takes a lot of time to complete. To save time running this example, please load the pre-trained agent by setting doTraining to false . To train the agent yourself, set doTraining to true.

doTraining = false;
if doTraining
    % Train the agent.
    trainingStats = train(agent,env,trainOpts);
else
    % Load the pretrained agent for the example.
    load('DoubleIntegDDPG.mat','agent');
end

Insert picture description here

DDPG agent simulation

To verify the performance of the trained agent, simulate it in a dual integrator environment. For more information about agent simulation, see rlSimulationOptions and sim .

simOptions = rlSimulationOptions('MaxSteps',500);
experience = sim(env,agent,simOptions);

Insert picture description here

totalReward = sum(experience.Reward)

Insert picture description here

Guess you like

Origin blog.csdn.net/wangyifan123456zz/article/details/109572842