1 空间域

1.1 命名空间: tf.name_scope()

tensorflow不知道什么nodes应该聚集在一起，除非在定义功能的时候，通过tf.name_scope(name)来显式的告诉tensorflow

with tf.name_scope('data'):
	...

with tf.name_scope('loss'):
	...

with tf.name_scope('optimizer'):
	...

1.2 变量空间: tf.variable_scope()

首先我们来看一个函数定义例子：

def two_hidden_layers(x):
	w1 = tf.Variable(tf.random_normal([100,50]), name='h1_weight')
	b1 = tf.Variable(tf.zeros([50]), name='h1_biases')
	h1 = tf.matmul(x, w1) + b1
	
	w2 = tf.Variable(tf.random_normal([50,10]), name='h2_weight')
	b2 = tf.Variable(tf.zeros([10]), name='h2_biases')
	logits = tf.matmul(h1, w2) + b2
	return logits

按照上面的定义，如下两行会是构造什么图?

logits1 = two_hidden_layers(x1)
logits2 = two_hidden_layers(x2)

我们将会得到两组参数变量，但是我们的目的是希望所有的输入都是使用相同的w和b权重参数。
在这里插入图片描述
那我们该怎么定义呢？通过tf.get_variable()函数，若一个变量已经存在，就重复利用，若没有存在，则用初始化器初始化，修改后代码如下所示：

def two_hidden_layers(x):
	assert x.shape.as_list() == [200, 100]
	w1 = tf.get_variable("h1_weight", [100,50], initializer=tf.random_normal_initializer())
	b1 = tf.get_variable("h1_biases", [50], initializer=tf.constant_initializer(0.0))
	h1 = tf.matmul(x, w1) + b1
	assert h1.shape.as_list() == [200,50]
	w2 = tf.get_variable("h2_weights", [50,10], initializer=tf.random_normal_initializer())
	b2 = tf.get_variable("h2_biases", [10], initializer=tf.constant_initializer(0.0))
	logits = tf.matmul(h1, w2) + b2
	return logits
with tf.variable_scope('two_layer') as scope:
	logits1 = two_hidden_layers(x1)
	scope.reuse_variables()
	logits2 = two_hidden_layers(x2)

上述代码三个点需要注意，第一：使用tf.get_variable()根据名字获取已经创建好的变量；第二，用tf.variable_scope()将变量放在一个范围空间内；第三，用scope.reuse_variables()重复利用该scope里的所有参数变量。我们看下构造的图：
在这里插入图片描述现在共用了相同的参数变量。但是代码写起来有点冗余，让我们再一次精简化代码：

def fully_connected(x, output_dim, scope):
	with tf.variable_scope(scope, reuse=tf.AUTO_REUSE) as scope:
		w = tf.get_variable("weights", [x.shape[1], output_dim], initializer=tf.random_normal_initializer())
		b = tf.get_variable("biases", [output_dim], initializer=tf.constant_initializer(0.0))
		return tf.matmul(x, w) +b
def two_hidden_layers(x):
	h1 = fully_connected(x, 50, 'h1')
	h2 = fully_connected(h1, 10, 'h2')
with tf.variable_scope('two_layers') as scope:
	logits1 = two_hidden_layers(x1)
	logits2 = two_hidden_layers(x2)

用fully_connected函数定义全连接层，减少代码冗余，在tf.variable_scope()通过设置reuse=tf.AUTO_REUSE，可以灵活的重复利用变量参数。

2 tf.train.Saver

tf.train.Saver保存的是当前sessions，保存和恢复通过如下两个函数调用形式：

# save
tf.train.Saver.save(sess, save_path, global_step=None...)
# restore
tf.train.Saver.restore(sess, save_path)
# restore the latest
tf.train.Saver.restore(sess, tf.train.latest_checkpoint(save_path))

注意tf.train.Saver只是保存变量参数variables，不是graph。可以指定特定的变量进行保存，如下所示：

v1 = tf.Variable(..., name='v1')
v2 = tf.Variable(..., name='v2')
saver = tf.train.Saver({
    
    'v1':v1, 'v2':v2})
# or
saver = tf.train.Saver([v1,v2])

2.1 global step

让我们先来看看global step在模型训练的时候，怎么定义和获取，首先，需要定义一个global step变量，该变量不许更新，最后将这个变量传入到优化器函数中，反向计算一次梯度，就会自动计数一次，如下所示函数：

global_step = tf.Variable(0, dtype=tf.int32, trainable=False, name='global_step')
optimizer = tf.train.AdamOptimizer(lr).minimize(loss, global_step=global_step)

3 tf.summary

在训练模型的时候，可视化参数和结果值的变化情况，主要通过如下三个函数：

tf.summary.scalar
tf.summary.histogram
tf.summary.image
实现步骤主要如下：
step 1: 创建summaries

with tf.name_scope("summaries"):
	tf.summary.scalar("loss", self.loss)
	tf.summary.scalar("accuracy", self.accuracy)
	tf.summary.histogram("histogram loss", self.loss)
	summary_op = tf.summary.merge_all()

step 2: run
summaries也是ops，所以当summaries创建的时候，需要在一个session中通过run来运行

loss, _, summary = sess.run([loss, optimizer, summary_op])

step 3: 将summaries写进文件

writer = tf.summary.FileWriter('./graphs', sess.graph)
writer.add_summary(summary, global_step=step)

所以整体的一个训练以及保存summaries的代码流程步骤如下：

# save result
tf.summary.scalar("loss", self.loss)
tf.summary.histogram("histogram_loss", self.loss)
summary_op = tf.summary.merge_all()
saver = tf.train.Saver()
with tf.Session() as sess:
	sess.run(tf.global_variables_initializer())
	ckpt =tf.train.get_checkpoint_state(os.path.dirname('checkpoints/checkpoint'))
	#
	if ckpt and ckpt.model_checkpoint_path:
		saver.restore(sess, ckpt.model_checkpoint_path)
	writer = tf.summary.FileWriter('./graphs', sess.graph)
	for step in range(1000):
		...
		loss, _, summary = sess.run([loss, optimizer, summary_op], feed_dict)
		writer.add_summary(summary, global_step=step)
		if (step+1) % 1000 == 0:
			saver.save(sess, 'checkpoints/model', step)

4 gradients

tensorflow根据链式法则会自动的进行梯度计算，如下图所示：
在这里插入图片描述
根据链式法则反向求导，在 $f, q, z, y, x$ 处的导数是什么呢？先给结果，如下图所示：

红色数字的在每处的导数怎么来的？首先我们从最后开始计算，结果对 $f$ 求导，因为 $f$ 本身对 $f$ 求导结果为1，再反向链式计算，对 $q$ 求导，等于对 $f$ 求导再乘以 $f$ 对 $q$ 求导等于 $z$ ，而 $z$ 的值等于-4，依次类推。

4.1 tf.gradients

让我们简单实现一个求导的例子：

# -*-coding:utf8 -*-
import tensorflow as tf

def main():
	x = tf.Variable(2.0)
	y = 2.0 * (x ** 3)
	z = 3.0 + y**2
	grad_z = tf.gradients(z, [x,y])
	with tf.Session() as sess:
		sess.run(x.initializer)
		print(sess.run(grad_z))

if __name__=='__main__':
	main()

运行结果如下：
在这里插入图片描述
tensorflow里关于梯度计算函数有如下一些函数：

tf.gradients(ys, xs, grad_ys=None, …) : 求解梯度
tf.stop_gradient(input, name=None) : 训练模型的时候，input为截断，在input之前的参数不再更新梯度
tf.clip_by_value(t, clip_value_min, clip_value_max, name=None) : 对梯度值进行范围控制
tf.clip_by_norm(t, clip_norm, axes=None, name=None) : 梯度归一化

tensorflow: tf.variable_scope，tf.summary，tf.gradients

目录