MATLAB reinforcement learning combat (6) Use Deep Network Designer to create an agent and use image observation for training

This example shows how to create a Deep Q Learning Network (DQN) agent that can balance a pendulum modeled in MATLAB®. In this example, you will use Deep Network Designer to create a DQN agent. For more information about DQN agents, see Deep Q-Network Agents .

Pendulum environment with image observation

The reinforcement learning environment for this example is a simple frictionless pendulum that is initially suspended in a downward position. The goal of training is to make the pendulum upright without spending the least amount of control.
Insert picture description here
For this environment:

  1. The upward position of the balance pendulum is 0 radians, and the downward suspension position is pi radians.
  2. The torque action signal from the agent to the environment is 2 ~ 2 N·m.
  3. Obtained from environmental observations are simplified pendulum gray-scale images and pendulum angle derivatives.
  4. Reward provided for each step rtr _trtfor
    Insert picture description here

among them:

  1. θ t \theta_t θtThe angle from the upright position to the displacement.
  2. θ ˙ t \dot\theta_t θ˙tIs the derivative of the displacement angle.
  3. u t − 1 u_{t-1} ut1It is the control work of the previous time step.

Create environment interface

Create a predefined environment interface for the pendulum.

env = rlPredefinedEnv('SimplePendulumWithImage-Discrete');

There are two observations on this interface. The first observation is called "pendImage" and is a 50 x 50 grayscale image.

obsInfo = getObservationInfo(env);
obsInfo(1)

Insert picture description here
The second observation is called "angularRate" and is the angular velocity of the pendulum.

obsInfo(2)

Insert picture description here
The interface has a discrete action space in which the agent can apply one of five possible torque values ​​to the pendulum: -2, -1, 0, 1, or 2 N·m.

actInfo = getActionInfo(env)

Insert picture description here
The reproducibility of the fixed random generator seed.

rng(0)

Use deep network designer to build critic network

The DQN agent uses the commenter value function notation to estimate long-term rewards based on given observations and operations. For this environment, the reviewer is a deep neural network with three inputs (two observations and one action) and one output . For more information on creating deep neural network value function representations, see Creating Strategy and Value Function Representations .

You can use the Deep Network Designer application to build a critic network interactively. To do this, you first create a separate input path for each observation and operation. These paths learn lower-level functions from their respective inputs. Then, create a common output path that combines the output of the input path.

Create image observation path

To create an image viewing path, first drag ImageInputLayer from the Layer Library pane to the canvas. Set the layer InputSize to 50,50,1 for image observation, and set Normalization to none.
Insert picture description here
Second, drag Convolution2DLayer onto the canvas and connect the input of this layer to the output of ImageInputLayer. Using two 10 height and width of the filter (the NumFilters attributes) to create a convolution layer (FilterSize attributes), and the use of 5 steps (Stride attributes) in the horizontal and vertical directions.
Insert picture description here
Finally, two sets of ReLULayer and FullyConnectedLayer layers are used to complete the image path network. The output sizes of the first and second FullyConnectedLayer layers are 400 and 300 , respectively .
Insert picture description here

Create full input path and output path

Construct other input paths and output paths in a similar manner. For this example, use the following options.

Angular velocity path (scalar input):

  1. ImageInputLayer — Set InputSize to 1,1 and Normalization to none.

  2. FullyConnectedLayer — Set the OutputSize to 400.

  3. ReLULayer

  4. FullyConnectedLayer — Set the OutputSize to 300.

Action path (scalar input):

  1. ImageInputLayer — Set InputSize to 1,1 and Normalization to none.

  2. FullyConnectedLayer — Set the OutputSize to 300.

Output path:

  1. AdditionLayer — Connect the output of all input paths to the input of this layer.

  2. ReLULayer

  3. FullyConnectedLayer - scalar value function OutputSize set to 1.

Insert picture description here

Export network from Deep Network Designer

To export the network to the MATLAB workspace, click " Export " in Deep Network Designer . Deep Network Designer exports the network as a new variable containing the network layer. You can use this layer network variable to create a commenter representation.

Or, to generate equivalent MATLAB code for the network, click Export > Generate Code .
Insert picture description here
The generated code is as follows.

lgraph = layerGraph();
layers = [
    imageInputLayer([1 1 1],"Name","torque","Normalization","none")
    fullyConnectedLayer(300,"Name","torque_fc1")];
lgraph = addLayers(lgraph,layers);
layers = [
    imageInputLayer([1 1 1],"Name","angularRate","Normalization","none")
    fullyConnectedLayer(400,"Name","dtheta_fc1")
    reluLayer("Name","dtheta_relu1")
    fullyConnectedLayer(300,"Name","dtheta_fc2")];
lgraph = addLayers(lgraph,layers);
layers = [
    imageInputLayer([50 50 1],"Name","pendImage","Normalization","none")
    convolution2dLayer([10 10],2,"Name","img_conv1","Stride",[5 5])
    reluLayer("Name","img_relu")
    fullyConnectedLayer(400,"Name","theta_fc1")
    reluLayer("Name","theta_relu1")
    fullyConnectedLayer(300,"Name","theta_fc2")];
lgraph = addLayers(lgraph,layers);
layers = [
    additionLayer(3,"Name","addition")
    reluLayer("Name","relu")
    fullyConnectedLayer(1,"Name","stateValue")];
lgraph = addLayers(lgraph,layers);
lgraph = connectLayers(lgraph,"torque_fc1","addition/in3");
lgraph = connectLayers(lgraph,"theta_fc2","addition/in1");
lgraph = connectLayers(lgraph,"dtheta_fc2","addition/in2");

Model code download address

Check the commenter's network configuration.

figure
plot(lgraph)

Insert picture description here
Use rlRepresentationOptions to specify the options represented by the reviewer.

criticOpts = rlRepresentationOptions('LearnRate',1e-03,'GradientThreshold',1);

Create a commenter representation using the specified deep neural network graph and options. You must also specify the reviewer's operation and observation information, which is obtained from the environment interface. For more information, see rlQValueRepresentation .

critic = rlQValueRepresentation(lgraph,obsInfo,actInfo,...
    'Observation',{
    
    'pendImage','angularRate'},'Action',{
    
    'torque'},criticOpts);

To create a DQN agent, first use rlDQNAgentOptions to specify DQN agent options.

agentOpts = rlDQNAgentOptions(...
    'UseDoubleDQN',false,...    
    'TargetUpdateMethod',"smoothing",...
    'TargetSmoothFactor',1e-3,... 
    'ExperienceBufferLength',1e6,... 
    'DiscountFactor',0.99,...
    'SampleTime',env.Ts,...
    'MiniBatchSize',64);
agentOpts.EpsilonGreedyExploration.EpsilonDecay = 1e-5;

Then, use the specified commenter representation and agent options to create a DQN agent. For more information, see rlDQNAgent .

agentOpts = rlDQNAgentOptions(...
    'UseDoubleDQN',false,...    
    'TargetUpdateMethod',"smoothing",...
    'TargetSmoothFactor',1e-3,... 
    'ExperienceBufferLength',1e6,... 
    'DiscountFactor',0.99,...
    'SampleTime',env.Ts,...
    'MiniBatchSize',64);
agentOpts.EpsilonGreedyExploration.EpsilonDecay = 1e-5;

Then, use the specified commenter representation and agent options to create a DQN agent. For more information, see rlDQNAgent .

agent = rlDQNAgent(critic,agentOpts);

Training agent

To train the agent, first specify the training options. For this example, use the following options.

  1. Each training has a maximum of 5000 episodes, and each episode lasts for a maximum of 500 time steps.

  2. Display the training progress in the "Plot Manager" dialog box (set the "Plots" option) and disable the command line display (set the "Verbose" option to false).

  3. When the average cumulative reward received by the agent within the default window length of five consecutive episodes is greater than –1000, stop training. At this point, the agent can quickly balance the pendulum in an upright position with minimal control force.

trainOpts = rlTrainingOptions(...
    'MaxEpisodes',5000,...
    'MaxStepsPerEpisode',500,...
    'Verbose',false,...
    'Plots','training-progress',...
    'StopTrainingCriteria','AverageReward',...
    'StopTrainingValue',-1000);

You can use the drawing function to visualize the pendulum system during training or simulation.

plot(env)

Insert picture description here
Use the training function to train the agent. This is a computationally intensive process that takes a lot of time to complete. To save time running this example, please load the pre-trained agent by setting doTraining to false. To train the agent yourself, set doTraining to true .

doTraining = false;

if doTraining
    % Train the agent.
    trainingStats = train(agent,env,trainOpts);
else
    % Load pretrained agent for the example.
    load('MATLABPendImageDQN.mat','agent');
end

Insert picture description here

Agent simulation

To verify the performance of the trained agent, simulate it in a pendulum environment. For more information about agent simulation, see rlSimulationOptions and sim .

simOptions = rlSimulationOptions('MaxSteps',500);
experience = sim(env,agent,simOptions);

Insert picture description here

totalReward = sum(experience.Reward)

Insert picture description here

Guess you like

Origin blog.csdn.net/wangyifan123456zz/article/details/109579918