1. Code
1.1 Code reading
tf.compat.v1.reset_default_graph() # 重置 TensorFlow 的默认计算图
# Q 和 target 网络
q_net = QNetwork(scope="q", VALID_ACTIONS=VALID_ACTIONS) # 创建 Q 网络
target_net = QNetwork(scope="target_q", VALID_ACTIONS=VALID_ACTIONS) # 创建 target 网络
# 状态处理器
state_processor = ImageProcess() # 创建状态处理器,用于处理状态数据
# TensorFlow 模型保存器
saver = tf.train.Saver() # 创建用于保存和恢复 TensorFlow 模型的 saver 对象
1.2 Code decomposition
1.2.1 scope="q",scope="target_q"
scope
The parameter is used here to specify the namespace of the TensorFlow variable , which is used to define the name prefix of the variable when creating the Q network and the target network , so as to ensure that variables in different networks have unique names.
For example, scope="q"
a variable name that specifies a Q network is prefixed with "q", and a scope="target_q"
variable name that specifies a target network is prefixed with "target_q". This way, when two networks share some variables, they can be distinguished by different name prefixes, avoiding naming conflicts.
1.2.2 q_net = QNetwork(scope="q", VALID_ACTIONS=VALID_ACTIONS)
q_net = QNetwork(scope="q", VALID_ACTIONS=VALID_ACTIONS)
This line of code creates an instance object of the QNetwork class q_net
and passes in two parameters:
-
scope="q"
: This is a string parameter that specifies the namespace for TensorFlow variables. Inside the QNetwork class, all TensorFlow variables will be named according to this namespace prefix to ensure the uniqueness of variable names. -
VALID_ACTIONS=VALID_ACTIONS
: This is a parameter used to specify a valid action.VALID_ACTIONS
is a list or array containing all valid actions that define the size of the output layer of the Q-network. Inside the QNetwork class, the size of the output layer will be set according to this parameter to match the number of valid actions in the environment.
By passing in these parameters, q_net
the object is initialized as a Q-network with a specific namespace and output layer size. In subsequent codes, q_net
methods of the Q network can be called through the object, such as q_net.predict()
and q_net.update()
, to perform network prediction and update operations.
1.2.3 target_net = QNetwork(scope="target_q", VALID_ACTIONS=VALID_ACTIONS)
target_net = QNetwork(scope="target_q", VALID_ACTIONS=VALID_ACTIONS)
This line of code creates an instance object of the QNetwork class target_net
and passes in two parameters:
-
scope="target_q"
: This is a string parameter that specifies the namespace for TensorFlow variables. Inside the QNetwork class, all TensorFlow variables will be named according to this namespace prefix to ensure the uniqueness of variable names. The namespace here is "target_q", which is used to identify that this is a Q network for the target network. -
VALID_ACTIONS=VALID_ACTIONS
: This is a parameter used to specify a valid action.VALID_ACTIONS
is a list or array containing all valid actions that define the size of the output layer of the Q-network. Inside the QNetwork class, the size of the output layer will be set according to this parameter to match the number of valid actions in the environment.
By passing in these parameters, target_net
the object is initialized as a Q network with a specific namespace and output layer size, dedicated to the update operation of the target network. In subsequent codes, target_net
methods of the Q network can be called through the object, such as target_net.predict()
and target_net.update()
, to predict and update the target network.