1. Code

1.1 Code reading

tf.compat.v1.reset_default_graph()  # 重置 TensorFlow 的默认计算图

# Q 和 target 网络
q_net = QNetwork(scope="q", VALID_ACTIONS=VALID_ACTIONS)  # 创建 Q 网络
target_net = QNetwork(scope="target_q", VALID_ACTIONS=VALID_ACTIONS)  # 创建 target 网络

# 状态处理器
state_processor = ImageProcess()  # 创建状态处理器，用于处理状态数据

# TensorFlow 模型保存器
saver = tf.train.Saver()  # 创建用于保存和恢复 TensorFlow 模型的 saver 对象

1.2 Code decomposition

1.2.1 scope="q"，scope="target_q"

scopeThe parameter is used here to specify the namespace of the TensorFlow variable , which is used to define the name prefix of the variable when creating the Q network and the target network , so as to ensure that variables in different networks have unique names.

For example, scope="q"a variable name that specifies a Q network is prefixed with "q", and a scope="target_q"variable name that specifies a target network is prefixed with "target_q". This way, when two networks share some variables, they can be distinguished by different name prefixes, avoiding naming conflicts.

1.2.2 q_net = QNetwork(scope="q", VALID_ACTIONS=VALID_ACTIONS)

q_net = QNetwork(scope="q", VALID_ACTIONS=VALID_ACTIONS)

This line of code creates an instance object of the QNetwork class q_netand passes in two parameters:

scope="q": This is a string parameter that specifies the namespace for TensorFlow variables. Inside the QNetwork class, all TensorFlow variables will be named according to this namespace prefix to ensure the uniqueness of variable names.
VALID_ACTIONS=VALID_ACTIONS: This is a parameter used to specify a valid action. VALID_ACTIONSis a list or array containing all valid actions that define the size of the output layer of the Q-network. Inside the QNetwork class, the size of the output layer will be set according to this parameter to match the number of valid actions in the environment.

By passing in these parameters, q_netthe object is initialized as a Q-network with a specific namespace and output layer size. In subsequent codes, q_netmethods of the Q network can be called through the object, such as q_net.predict()and q_net.update(), to perform network prediction and update operations.

1.2.3 target_net = QNetwork(scope="target_q", VALID_ACTIONS=VALID_ACTIONS)

target_net = QNetwork(scope="target_q", VALID_ACTIONS=VALID_ACTIONS)

This line of code creates an instance object of the QNetwork class target_netand passes in two parameters:

scope="target_q": This is a string parameter that specifies the namespace for TensorFlow variables. Inside the QNetwork class, all TensorFlow variables will be named according to this namespace prefix to ensure the uniqueness of variable names. The namespace here is "target_q", which is used to identify that this is a Q network for the target network.
VALID_ACTIONS=VALID_ACTIONS: This is a parameter used to specify a valid action. VALID_ACTIONSis a list or array containing all valid actions that define the size of the output layer of the Q-network. Inside the QNetwork class, the size of the output layer will be set according to this parameter to match the number of valid actions in the environment.

By passing in these parameters, target_netthe object is initialized as a Q network with a specific namespace and output layer size, dedicated to the update operation of the target network. In subsequent codes, target_netmethods of the Q network can be called through the object, such as target_net.predict()and target_net.update(), to predict and update the target network.

Python-DQN code reading (11)