简述

Google开源, 支持包括CNN、RNN、LSTM等多种神经网络模型.
API 快速参考点这里.
国内环境可以访问 google 的cn站点，点这里。

graph 与 session

graph 与 session, 前者是静态的神经网络计算图; 后者是有数据流动的动态计算. 就像是程序与进程的关系.
Graph=(Node,Edge), 前者叫 operation ,负责产生与计算 tensor; 后者就是 tensor 在 nodes 间流动.

不显式地创建graph时，系统会自动创建一个默认的 graph ，该 graph 可以通过 tf.get_default_graph() 获得.

# 验证默认的 graph
c = tf.constant(4.0)
assert c.graph is tf.get_default_graph()

# 也可以显式地自己创建
g = tf.Graph()
with g.as_default():
  # Define operations and tensors in `g`.
  c = tf.constant(30.0)
  assert c.graph is g

变量

2.1 placeholder

tf.placeholder(dtype, shape=None, name=None)
占位符. 通常用于输入与输出, 即features与labels.
一个例子
x = tf.placeholder(tf.float32, shape=(number_of_samples, INPUT_DIMENSION), name="x-input")
TIPS
这里的shape可以填shape=(None, INPUT_DIMENSION) 表示是动态的, 以输入的数据为准. 这样有什么好处呢?
可以指定 batch_size 分批训练, 可以在测试集中使用与训练集不同的 batch_size 来评价.

2.2 variable

tf.Variable
类. 表示tf中可以被训练的变量. 比如网络层之间的连接权重.
__init__(self, initial_value=None, ... , name=None, ...)
必须指定初始值, 起个名字方便在tf-board中看.
tf.get_variable(name,shape,dtype,initializer=None,...)
既可以创建 variable, 也可以复用之前创建的 variable.
Args:
initializer: 常见的有 tf.zeros_initializer. 默认使用 tf.glorot_uniform_initializer.
也可以通过tensor来指定, 如other_variable = tf.get_variable("other_variable", dtype=tf.int32, initializer=tf.constant([23, 42]))

变量重用

有时, 我们希望在网络的多个层里共用连接矩阵, 就需要对变量重用了.

import tensorflow as tf
a=tf.get_variable(name='a',shape=[2, 3],initializer=tf.truncated_normal_initializer)
a1 = tf.get_variable(name='a', shape=[2, 3], initializer=tf.truncated_normal_initializer)
"""
ValueError: Variable a already exists, disallowed. Did you mean to set reuse=True or reuse=tf.AUTO_REUSE in VarScope? Originally defined at: xxx
"""

2.3 scope

变量多了以后容易混乱, 使用 scope 起到命名空间的作用.
用法如:

with tf.variable_scope("scope_1") as scope:
    w=tf.get_variable(...)
with tf.variable_scope("scope_2") 
    w=tf.get_variable(...)

tensorflow.python.ops.variable_scope
模块
get_variable_scope()
Returns the current variable scope.

2.4 Initializer

为了配合 tf.get_variable(), tf提供了常用的initializer.

tf.constant_initializer
tf.random_normal_initializer
tf.truncated_normal_initializer
初始化为满足正态分布的随机值, 但如果一个值偏离平均值超过两个标准差, 会被舍弃重新生成.
tf.random_uniform_initializer
tf.zeros_initializer
tf.ones_initializer

初始化

tf.global_variables_initializer()
返回一个op (operation), 表示初始化 tf.GraphKeys.GLOBAL_VARIABLES collection 中的所有变量.
需要注意的是, 这个操作的位置不能随意放, 必须在计算图搭建完成之后调用!

2.6 reset

tf.reset_default_graph() , 重置当前的张量图, 相当于清空所有的张量, 在 jupyter 中可以用到, 比如有些cell 执行过后不满意,就可抹掉执行效果.

运算操作

3.1 数据生成

tensorflow.python.ops.random_ops.truncated_normal(shape, mean=0.0, stddev=1.0, dtype=dtypes.float32, seed=None, name=None)
truncated normal distribution. 初始化为满足正态分布的随机值, 但如果一个值偏离平均值超过两个标准差, 会被舍弃重新生成.
可通过tf.truncated_normal()调用.
tensorflow.python.ops.random_ops.random_uniform(shape, minval=0, maxval=None, dtype=dtypes.float32, seed=None, name=None)
均匀分布, 可通过 tf.random_uniform() 调用.

3.2 数据转换

tf.expand_dims(input, axis=None, name=None, dim=None)
即tensorflow.python.ops.array_ops.expand_dims(...) 方法. 在给定的input这个tensor中增加一维.
axis: 要向 input 中插入的轴. 若为-1, 表示追加在末尾.
tf.squeeze()input,axis=None,...)
降维. 将size为1的那些维度给降掉. 例子见下:

# 't' is a tensor of shape [1, 2, 1, 3, 1, 1]
tf.shape(tf.squeeze(t))  # [2, 3]

tf.cast(x, dtype, name=None)
即 Casts a tensor to a new type. 如将tf.feature_column.input_layer(...)返回的tf.float类型转换为tf.int类型.
tf.one_hot(indices, depth, ...)
指定 index 与 depth, 返回一个 one-hot tensor. 例子:

sess.run(tf.one_hot(2, 3)) # [0,0,1]
sess.run(tf.one_hot(0, 3)) # [1,0,0]

tf.reshape(tensor, shape, name=None)
用于改变一个tensor的形状. 类似于 np.reshape(). 例子:

t=[1, 2, 3, 4, 5, 6, 7, 8, 9]
reshape(t, [3, 3]) ==> [[1, 2, 3],
                          [4, 5, 6],
                          [7, 8, 9]]

3.3 常用运算

Q1:同样的layer处理, 有时既有class又有对应的function interface, 如tf.layers.Conv2D与tf.layers.conv2d, 有什么区别呢?
A: 这些class继承自tensorflow.python.layers.base.Layer, 基类实现了__call__(self, inputs, *args, **kwargs)的方法, 就是该类的对象就变成了可调用对象, 后面可以直接传参数inputs, 便于网络中引出分支或多个input使用同样的layer进行数据传递.

Q2: 我们在使用tensorflow时，会发现tf.nn，tf.layers， tf.contrib模块有很多功能是重复的, 尤其是卷积操作,怎么区别与联系?
A:下面是对三个模块的简述：

tf.nn ：提供神经网络相关操作的支持，包括卷积操作（conv）、池化操作（pooling）、归一化、loss、分类操作、embedding、RNN、Evaluation。
tf.layers：主要提供的高层的神经网络，主要和卷积相关的，个人感觉是对tf.nn的进一步封装，tf.nn会更底层一些。
tf.contrib：tf.contrib.layers提供够将计算图中的网络层、正则化、摘要操作、是构建计算图的高级操作，但是tf.contrib包含不稳定和实验代码，有可能以后API会改变。

以上三个模块的封装程度是逐个递进的。

单个 tensor

tf.split(value, num_or_size_splits, axis=0,...)
对 tensor 进行切分.
For example:


# 'value' is a tensor with shape [5, 30]


# Split 'value' into 3 tensors with sizes [4, 15, 11] along dimension 1

split0, split1, split2 = tf.split(value, [4, 15, 11], 1)
tf.shape(split0)  # [5, 4]
tf.shape(split1)  # [5, 15]
tf.shape(split2)  # [5, 11]

# Split 'value' into 3 tensors along dimension 1

split0, split1, split2 = tf.split(value, num_or_size_splits=3, axis=1)
tf.shape(split0)  # [5, 10]

tf.reshape(tensor, shape, name=None)
跟numpy类似, -1表示自动推断, 第一维要考虑到 batch_size, 这是与 keras 的 reshape 有区别的地方.
Args:
- shape
  如果 tensor 的 shape 为[1,], 那么shape=[] , 表示要把 tensor 转换成为一个 scalar. 这是与np.reshape()不同的地方, np 不接受[]这样的参数.
tf.layers.flatten(inputs, name=None)
将tensor展开为(BATCH_SIZE,展开后的维度)的一维形式.
tf.square(x, name=None)
计算平方.
tf.reduce_mean(input_tensor,axis=None)
Reduces input_tensor along the dimensions given in axis
tf.reduce_sum(input_tensor,axis=None)
Computes the sum of elements across dimensions of a tensor.

卷积池化类

tf.nn.embedding_lookup(params,ids, partition_strategy="mod", name=None, ...)
方法, 快速查找id对应的张量.
params: 即 embedding_matrix.
ids: A Tensor with type int32 or int64 containing the ids to be looked up in params.
tf.layers.conv2d(inputs,filters,kernel_size,strides=(1, 1), padding='valid',...)
二维卷积, inputs.shape 需要为None,x,x,x这样的四维结构.
tf.layers.max_pooling2d((inputs,pool_size, strides,...)
配合 conv2d 使用,

同层之间的运算

~~tf.tensordot(a, b, axes, name=None)~~
两个 tensor 之间的点乘, 即 dot product. 此方法没搞懂, 慎用. 想求点乘还是用熟悉的 tf.reduce_sum(tf.multiply(a, b), axis=1) 较好.
tf.multiply(a,b)
等同于np的数组乘法, 即对应元素相乘.
tf.keras.layers.dot(inputs, axes, normalize=False)
keras的点乘操作, normalize=True 就等价于 cosine similarity.

前后层之间的运算

tf.layers.dense(inputs,units,activation=None,use_bias=True,...)
增加一层全连接. 返回计算后的tensor. 需要的 W,b及激活层都会被自动的创建.
tf.contrib.layers.fully_connected(inputs, num_outputs, activation_fn=nn.relu, ...)
与tf.layers.dense 功能相同.
tf.matmul(a,b)
Multiplies matrix a by matrix b.
tf.nn.xw_plus_b(x, weights, biases, name=None)
常用操作的封装, Computes matmul(x, weights) + biases.
Args:
x: a 2D tensor. Dimensions typically: batch, in_units
weights: a 2D tensor. Dimensions typically: in_units, out_units
biases: a 1D tensor. Dimensions: out_units
tf.nn.softmax(logits, dim=-1, name=None)
即tensorflow.python.ops.nn_ops.softmax(logits, dim=-1, name=None)
计算 softmax 激活.

损失函数

均方误差类

tf.losses.mean_squared_error(labels, predictions, ...)
均方误差作损失函数.

交叉熵

分类(二分类, 多分类) 问题通常用交叉熵作损失函数.

tf.nn.sparse_softmax_cross_entropy_with_logits(... ,labels=None, logits=None, name=None)
类别间互斥的硬分类中, 通常用该函数. 参数见下:
- logits
  shape is [batch_size, num_classes] and dtype float32 or float64.
  多分类中, 输出层的节点个数=num_class, 前向传播后, 每个结点都会有个float32的值. 此时未作softmax处理.
- labels
  shape is [batch_size, ]and dtype int , 代表 category 的 index. 从 0 计数.

该函数先soft_max正规化, 再计算交叉熵. 使用效果见下:

# [1,2]
logits=tf.constant([1,2],dtype=tf.float32)
# [1]
labels=tf.constant(1) 

# step 1. logits -> softmax,  [1,2] -> [0.268,0.731]
# step 2. calculate cross entropy. it's 0.313262, 即 $-log(0.731)$
x=tf.nn.sparse_softmax_cross_entropy_with_logits(logits=logits,labels=labels)
print(sess.run(x))

tf.nn.softmax_cross_entropy_with_logits( labels=None,logits=None,...)
多个类别不互斥, 通常称软分类, 以不同的概率落入不同的类别.
Args:
- labels
  shape 是 [batch_size,num_class]
- logits
  shape 也是 [batch_size,num_class]

tf.metrics

评估阶段相关的指标.

tf.metrics.accuracy( labels, predictions, ...)
计算labels与predictions 相一致的频率. 内部维护了两个 local variable, count 与 total, $accuracy=\frac{count}{total}$ . 所以这个适用于 stream data 的数据评估.
注意它的返回参数有两个, 第一个是当前的准确度tensor, 第二个是更新 total 与 count 的op, 对这个op的run, 返回的是这次调用后的最新准确度.
用法示例见 : Stack Overflow 讨论

优化方法

tensorflow.python.training.optimizer.Optimizer
优化方法的基类.
minimize(self, loss, ...)
运用优化方法求损失函数的极小值.
tensorflow.python.training.gradient_descent.GradientDescentOptimizer(optimizer.Optimizer)
类. 梯度下降法的实现.
__init__(self, learning_rate)
构造函数中指定学习速率.
AdamOptimizer(optimizer.Optimizer)
类. 实现了Adam算法的优化器, 它是一种随机梯度下降法.

Estimator 评估器

sklearn中的训练与预测分别是fit() 与 predict()方法, 非常方便. 那么tf也提供了wrapper来向sklearn兼容.

Estimator
tensorflow.python.estimator.estimator.Estimator
类.训练与评估 TensorFlow model.
__init__(self, model_fn, model_dir=None, config=None, params=None)
model_fn 指向一个函数, 要求的签名是model_fn(features,labels,mode=None, params=None, config=None).
model_dir: Directory to save model parameters, graph and etc.
SKCompat
tensorflow.contrib.learn.python.learn.estimators.estimator.SKCompat(sklearn.BaseEstimator)
类. Scikit learn wrapper for TensorFlow Learn Estimator.
fit(self, x, y, batch_size=128, steps=None, max_steps=None, monitors=None)
训练函数.

collection

tf.add_to_collection(name, value)
提供一个全局的存储机制，不会受到变量命名空间的影响。一处保存，到处可取。
tf.get_collection(key, scope=None)
对应add操作, 这里把存进去的内容读出来.

动态配置

为了让程序中的超参数更灵活, 采用配置文件的方式. 同 java 中的 .properties 文件如出一辙.
在tf中可以这么写 :

import tensorflow as tf  

FLAGS = tf.app.flags.FLAGS

#表示从配置文件中读取learning_rate变量的值, 如果读不到以 0.01 作默认值. 注解为 "学习速率".
#类似地, 还有 `DEFINE_integer()`, `DEFINE_boolean()`等.
tf.app.flags.DEFINE_string("learning_rate", "0.01", "learning rate")

FLAGS = tf.app.flags.FLAGS 
#读出来配置的值
learning_rate=FLAGS.learning_rate

TensorFlow 常用类与方法

简述