MATLAB Reinforcement Learning Toolbox (11) Training DDPG intelligent body to control flying robot

This example shows how to train a Deep Deterministic Policy Gradient (DDPG) agent and generate a trajectory for a flying robot.

Flying robot model

The reinforcement learning environment for this example is a flying robot whose initial conditions are randomized around a circle with a radius of 15 m. The direction of the robot is also random. The robot has two thrusters mounted on the side of the main body for pushing and manipulating the robot. The goal of training is to drive the robot from its initial state to the origin facing east.

Open the model

mdl = 'rlFlyingRobotEnv';
open_system(mdl)

Insert picture description here
click to enter.
Insert picture description here
Set initial model variables.

theta0 = 0;
x0 = -15;
y0 = 0;

Define the sampling time Ts and the simulation duration Tf.

Ts = 0.4;
Tf = 30;

For this model:

  1. The target direction is θ \thetaθ degrees (the robot faces east).

  2. The thrust range of each actuator is -1 to 1 N.

  3. Observed from the environment are the position, direction (sine and cosine of the direction), speed and angular velocity of the robot.

  4. Reward rt r_trtWhat each time step provides is
    Insert picture description here

Where:
5. xt x_txtIs the position of the robot along the x-axis.
6. yt y_tYtIs the position of the robot along the y axis.
7. θ t \theta_tθtIs the direction of the robot.
8. L t − 1 L_{t-1}Lt1It is the control force of the left thruster.
9. R t − 1 R_{t-1}Rt1It is the control force of the right thruster.
10. r 1 r_1r1It is the reward when the robot approaches the target.
11. r 2 r_2r2It is a penalty for the robot to travel more than 20 meters in the x or y direction. When r 2 <0 r_2 <0r2<At 0 , the simulation is terminated.
12.r 3 r_3r3It is the QR penalty, which penalizes the distance from the target and the control effect.

Create an integrated model

To train an agent for the FlyingRobotEnv model, use the createIntegratedEnv function to automatically generate an integrated model with the RL agent block ready for training.

integratedMdl = 'IntegratedFlyingRobot';
[~,agentBlk,observationInfo,actionInfo] = createIntegratedEnv(mdl,integratedMdl);

Insert picture description here

Action and observation

Before creating the environment object, specify the name of the observation and action specification, and limit the thrust action to between -1 and 1.

The observation signal of this environment is
Insert picture description here

numObs = prod(observationInfo.Dimension);
observationInfo.Name = 'observations';

The action signal of this environment is
Insert picture description here

numAct = prod(actionInfo.Dimension);
actionInfo.LowerLimit = -ones(numAct,1);
actionInfo.UpperLimit =  ones(numAct,1);
actionInfo.Name = 'thrusts';

Create environment interface

Use the integrated model to create an environmental interface for the flying robot.

env = rlSimulinkEnv(integratedMdl,agentBlk,observationInfo,actionInfo);

Reset function

Create a custom reset function that will randomize the initial position of the robot along a 15 m radius ring and initial direction. For more information about the reset function, see flyingRobotResetFcn.

env.ResetFcn = @(in) flyingRobotResetFcn(in);

Fixed random generator seed to improve repeatability.

rng(0)

Create DDPG agent

The DDPG agent approximates the long-term reward for a given observation and operation by using the critic value function representation. To create a commenter, you first need to create a deep neural network with two inputs (observation and action) and one output. For more information on creating neural network value function representations, see Creating Strategies and Value Functions .

% Specify the number of outputs for the hidden layers.
hiddenLayerSize = 100; 

observationPath = [
    featureInputLayer(numObs,'Normalization','none','Name','observation')
    fullyConnectedLayer(hiddenLayerSize,'Name','fc1')
    reluLayer('Name','relu1')
    fullyConnectedLayer(hiddenLayerSize,'Name','fc2')
    additionLayer(2,'Name','add')
    reluLayer('Name','relu2')
    fullyConnectedLayer(hiddenLayerSize,'Name','fc3')
    reluLayer('Name','relu3')
    fullyConnectedLayer(1,'Name','fc4')];
actionPath = [
    featureInputLayer(numAct,'Normalization','none','Name','action')
    fullyConnectedLayer(hiddenLayerSize,'Name','fc5')];

% Create the layer graph.
criticNetwork = layerGraph(observationPath);
criticNetwork = addLayers(criticNetwork,actionPath);

% Connect actionPath to observationPath.
criticNetwork = connectLayers(criticNetwork,'fc5','add/in2');

Use rlRepresentationOptions to specify options for the reviewer.

criticOptions = rlRepresentationOptions('LearnRate',1e-03,'GradientThreshold',1);

Create a commenter representation using the specified neural network and options. You must also specify operating and observation specifications for the reviewer. For more information, see rlQValueRepresentation .

critic = rlQValueRepresentation(criticNetwork,observationInfo,actionInfo,...
    'Observation',{
    
    'observation'},'Action',{
    
    'action'},criticOptions);

The DDPG agent uses actors to decide which action to take. To create a character, you must first create a deep neural network with one input (observation) and one output (action).

Similar to commenters, construct actors. For more information, see rlDeterministicActorRepresentation .

actorNetwork = [
    featureInputLayer(numObs,'Normalization','none','Name','observation')
    fullyConnectedLayer(hiddenLayerSize,'Name','fc1')
    reluLayer('Name','relu1')
    fullyConnectedLayer(hiddenLayerSize,'Name','fc2')
    reluLayer('Name','relu2')
    fullyConnectedLayer(hiddenLayerSize,'Name','fc3')
    reluLayer('Name','relu3')
    fullyConnectedLayer(numAct,'Name','fc4')
    tanhLayer('Name','tanh1')];

actorOptions = rlRepresentationOptions('LearnRate',1e-04,'GradientThreshold',1);

actor = rlDeterministicActorRepresentation(actorNetwork,observationInfo,actionInfo,...
    'Observation',{
    
    'observation'},'Action',{
    
    'tanh1'},actorOptions);

To create a DDPG agent, first use rlDDPGAgentOptions to specify the DDPG agent options.

agentOptions = rlDDPGAgentOptions(...
    'SampleTime',Ts,...
    'TargetSmoothFactor',1e-3,...
    'ExperienceBufferLength',1e6 ,...
    'DiscountFactor',0.99,...
    'MiniBatchSize',256);
agentOptions.NoiseOptions.Variance = 1e-1;
agentOptions.NoiseOptions.VarianceDecayRate = 1e-6;

Then, use the specified actor representation, commenter representation and agent options to create an agent. For more information, see rlDDPGAgent .

agent = rlDDPGAgent(actor,critic,agentOptions);

Training agent

To train the agent, first specify the training options. For this example, use the following options:

  1. Each training is carried out at most 20,000 rounds, and each round lasts at most ceil (Tf / Ts) time steps.

  2. Display the training progress in the "Episode Manager" dialog box (set the "Episode" option) and disable the command line display (set the "Verbose" option to false).

  3. When the average cumulative reward obtained by the agent in 10 consecutive episodes is greater than 415, please stop training. At this point, the agent can drive the flying robot to the target position.

  4. Save a copy of the agent for each episode where the cumulative reward is greater than 415.

maxepisodes = 20000;
maxsteps = ceil(Tf/Ts);
trainingOptions = rlTrainingOptions(...
    'MaxEpisodes',maxepisodes,...
    'MaxStepsPerEpisode',maxsteps,...
    'StopOnError',"on",...
    'Verbose',false,...
    'Plots',"training-progress",...
    'StopTrainingCriteria',"AverageReward",...
    'StopTrainingValue',415,...
    'ScoreAveragingWindowLength',10,...
    'SaveAgentCriteria',"EpisodeReward",...
    'SaveAgentValue',415); 

Use the training function to train the agent. Training is a computationally intensive process that takes several hours to complete. To save time running this example, please load the pre-trained agent by setting doTraining to false. To train the agent yourself, set doTraining to true.

doTraining = false;
if doTraining    
    % Train the agent.
    trainingStats = train(agent,env,trainingOptions);
else
    % Load the pretrained agent for the example.
    load('FlyingRobotDDPG.mat','agent')       
end

Insert picture description here

DDPG agent simulation

To verify the performance of the trained agent, simulate the agent in the environment. For more information about agent simulation, see rlSimulationOptions and sim .

simOptions = rlSimulationOptions('MaxSteps',maxsteps);
experience = sim(env,agent,simOptions);

Insert picture description here

Guess you like

Origin blog.csdn.net/wangyifan123456zz/article/details/109520799