Tensorflow 搭建自己的神经网络(一)

视频教程：https://www.bilibili.com/video/av16001891

网站教程：https://morvanzhou.github.io/tutorials/machine-learning/tensorflow/

神经网络的输入只能是数值型：BP神经网络反向的误差传播过程中有求导运算的，必须是连续可导的函数才能进行此运算，所以输入也必须是数值型的数据（向量或者矩阵）。

优化问题：梯度下降法；牛顿法；最小二乘法

Tensorflow中的数据类型基本都是float32

一.Tensorflow的简单使用

一个简单的例子：

import tensorflow as tf
import numpy as np

# 创建数据
x_data = np.random.rand(100).astype(np.float32)
y_data = x_data*0.1 + 0.3

# 搭建模型
Weights = tf.Variable(tf.random_uniform([1], -1.0, 1.0))
biases = tf.Variable(tf.zeros([1]))
y = Weights*x_data + biases

# 计算误差
loss = tf.reduce_mean(tf.square(y-y_data))

# 优化    学习率为0.5
optimizer = tf.train.GradientDescentOptimizer(0.5)
train = optimizer.minimize(loss)

# 初始化所有之前定义的Variable
init = tf.global_variables_initializer()

# 创建会话 Session
sess = tf.Session()
sess.run(init)         

for step in range(201):
    sess.run(train)
    if step % 20 == 0:
        print(step, sess.run(Weights), sess.run(biases))

sess.close()

Session的两种打开模式：

import tensorflow as tf

# create two matrixes
matrix1 = tf.constant([[3,3]])
matrix2 = tf.constant([[2],
                       [2]])
product = tf.matmul(matrix1,matrix2)  # 矩阵乘法，相当于numpy中的np.dot()

# method 1
sess = tf.Session()
result = sess.run(product)
print(result)
sess.close()
# [[12]]

# method 2
with tf.Session() as sess:
    result2 = sess.run(product)
    print(result2)
# [[12]]

Variable 变量：

import tensorflow as tf

# 定义变量
state = tf.Variable(0, name='counter')

# 定义常量
one = tf.constant(1)

# 定义加法步骤 (注: 此步并没有直接计算)
new_value = tf.add(state, one)

# 将 state 更新成 new_value
update = tf.assign(state, new_value)

# 如果定义 Variable, 就一定要 initialize
init = tf.global_variables_initializer()
 
# 使用 Session
with tf.Session() as sess:
    sess.run(init)
    for _ in range(3):
        sess.run(update)
        print(sess.run(state))
# 直接 print(state) 不起作用,一定要把 sess 的指针指向 state 再进行 print 才能得到想要的结果

如果在 Tensorflow 中设定了变量(Variable)，那么初始化变量是最重要的！！所以定义了变量以后, 一定要定义 init = tf.global_variables_initializer()

placeholder 传入值：

placeholder 是 Tensorflow 中的占位符，暂时储存变量.Tensorflow 如果想要从外部传入data, 那就需要用到 tf.placeholder(), 然后以这种形式传输数据: sess.run(***, feed_dict={input: **}).

import tensorflow as tf

#在 Tensorflow 中需要定义 placeholder 的 type ，一般为 float32 形式
input1 = tf.placeholder(tf.float32)
input2 = tf.placeholder(tf.float32)

# 将input1和input2 做乘法运算，并输出为 output 
ouput = tf.multiply(input1, input2)

# 传值的工作交给了 sess.run() , placeholder 与 feed_dict={} 是绑定在一起出现的。
with tf.Session() as sess:
    print(sess.run(ouput, feed_dict={input1: [7.], input2: [2.]}))
# [ 14.]

激活函数(激励函数)：

层数较少时，选择哪个激活函数都可以；层数较多时，为避免梯度消失和梯度爆炸，需要慎重选择激活函数

在卷积神经网络的卷积层中，推荐使用relu；在循环神经网络中，推荐使用relu或tanh

激励函数运行时激活神经网络中某一部分神经元，将激活信息向后传入下一层的神经系统。激励函数的实质是非线性方程。 Tensorflow 的神经网络里面处理较为复杂的问题时都需要运用激励函数 activation function

二.搭建一个神经网络

添加层 def add_layer():

在 Tensorflow 里定义一个添加层的函数可以很容易的添加神经层,为之后的添加省下不少时间.神经层里常见的参数通常有weights、biases和激励函数。

import tensorflow as tf

def add_layer(inputs, in_size, out_size, activation_function=None):    
    Weights = tf.Variable(tf.random_normal([in_size, out_size]))
    biases = tf.Variable(tf.zeros([1, out_size]) + 0.1)
    # 在机器学习中，biases的推荐值不为0，所以这里是在0向量的基础上又加了0.1
    Wx_plus_b = tf.matmul(inputs, Weights) + biases

    if activation_function is None:
        outputs = Wx_plus_b
    else:
        outputs = activation_function(Wx_plus_b)
    
    return outputs

搭建神经网络:

import numpy as np

# 构建所需的数据。这里的x_data和y_data并不是严格的一元二次函数的关系，因为我们多加了一个noise,这样看起来会更像真实情况。
x_data = np.linspace(-1,1,300, dtype=np.float32)[:, np.newaxis]
noise = np.random.normal(0, 0.05, x_data.shape).astype(np.float32)
y_data = np.square(x_data) - 0.5 + noise

# 利用占位符定义我们所需的神经网络的输入。tf.placeholder()就是代表占位符，这里的None代表无论输入有多少都可以，因为输入只有一个特征，所以这里是1。
xs = tf.placeholder(tf.float32, [None, 1])
ys = tf.placeholder(tf.float32, [None, 1])

# 开始定义神经层。通常神经层都包括输入层、隐藏层和输出层。这里的输入层只有一个属性，所以我们就只有一个输入；隐藏层我们可以自己假设，这里我们假设隐藏层有10个神经元；输出层和输入层的结构是一样的，所以我们的输出层也是只有一层。所以，我们构建的是——输入层1个、隐藏层10个、输出层1个的神经网络。

# 开始定义隐藏层,利用之前的add_layer()函数，这里使用 Tensorflow 自带的激励函数tf.nn.relu。
l1 = add_layer(xs, 1, 10, activation_function=tf.nn.relu)
# 接着，定义输出层。此时的输入就是隐藏层的输出——l1，输入有10层（隐藏层的输出层），输出有1层。
prediction = add_layer(l1, 10, 1, activation_function=None)

loss = tf.reduce_mean(tf.reduce_sum(tf.square(ys - prediction),
                     reduction_indices=[1]))  # reduction_indices=[1]即axis=1

train_step = tf.train.GradientDescentOptimizer(0.1).minimize(loss)

init = tf.global_variables_initializer()

# 定义Session，并用 Session 来执行 init 初始化步骤（注意：在tensorflow中，只有session.run()才会执行我们定义的运算）
sess = tf.Session()
sess.run(init) 

# 训练
for i in range(1000):
    sess.run(train_step, feed_dict={xs: x_data, ys: y_data})
    if i % 50 == 0:
        print(sess.run(loss, feed_dict={xs: x_data, ys: y_data}))
# 如果误差是在逐渐减小，则说明机器学习是有积极的效果的。

sess.close()

可视化：

import matplotlib.pyplot as plt

# plot the real data
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.scatter(x_data, y_data)
plt.ion()# 用于连续显示,否则只显示初始的状态
plt.show()

for i in range(1000):
    # training
    sess.run(train_step, feed_dict={xs: x_data, ys: y_data})
    if i % 50 == 0:
        # to visualize the result and improvement
        try:
            ax.lines.remove(lines[0])
        except Exception:
            pass
        prediction_value = sess.run(prediction, feed_dict={xs: x_data})
        # plot the prediction
        lines = ax.plot(x_data, prediction_value, 'r-', lw=5)
        plt.pause(0.1)
# 每隔50次训练刷新一次图形，用红色、宽度为5的线来显示我们的预测数据和输入之间的关系，每次显示暂停0.1s。

加速神经网络训练:

Stochastic Gradient Descent (SGD) 随机梯度下降
Momentum 动量法
AdaGrad
RMSProp
Adam 最常用比较好用

优化器：

Tensorflow提供了7种优化器：

可视化工具：Tensorboard

三.高阶内容

分类问题（mnist手写体数字识别）：

相当于一个多分类问题，二分类使用逻辑回归(sigmoid函数作激活函数)，多分类使用逻辑回归的变种(softmax函数作激活函数)

import tensorflow as tf
from tensorflow.examples.tutorials.mnist import input_data
mnist = input_data.read_data_sets('MNIST_data', one_hot=True)

# 定义add_layer函数

def compute_accuracy(v_xs,v_ys):
    global prediction
    y_pre = sess.run(prediction,feed_dict={xs:v_xs})
    correct_prediction = tf.equal(tf.argmax(y_pre,1),tf.argmax(v_ys,1))
    accuracy = tf.reduce_mean(tf.cast(correct_prediction,tf.float32))
    # tf.cast(x,dtype,name=None):将x的数据格式转化为dtype
    result = sess.run(accuracy,feed_dict={xs:v_xs,ys:v_ys})
    return result

xs = tf.placeholder(tf.float32,[None,784]) # 28*28
ys = tf.placeholder(tf.float32,[None,10])

prediction = add_layer(xs,784,10,activation_function=tf.nn.softmax)

cross_entropy = tf.reduce_mean(-tf.reduce_sum(
    ys * tf.log(prediction),reduction_indices=[1]))

train_step = tf.train.GradientDescentOptimizer(0.5).minimize(cross_entropy)

sess = tf.Session()
sess.run(tf.global_variables_initializer())
for i in range(1000):
    # 开始train，每次只取100张图片，免得数据太多训练太慢
    batch_xs,batch_ys = mnist.train.next_batch(100)
    sess.run(train_step,feed_dict={xs:batch_xs,ys:batch_ys})
    if i % 50 == 0:
        print(compute_accuracy(mnist.test.images,mnist.test.labels))
sess.close()

dropout正则化：

有一种专门用在神经网络的正规化的方法, 叫作 dropout. 在训练的时候, 我们随机忽略掉一些神经元和神经联结 , 使这个神经网络变得”不完整”. 用一个不完整的神经网络训练一次.

到第二次再随机忽略另一些, 变成另一个不完整的神经网络. 有了这些随机 drop 掉的规则, 我们可以想象其实每次训练的时候, 我们都让每一次预测结果都不会依赖于其中某部分特定的神经元. 像l1, l2正规化一样, 过度依赖的 W , 也就是训练参数W的数值会很大, l1, l2会惩罚这些大的参数. Dropout 的做法是从根本上让神经网络没机会过度依赖.

keep_prob = tf.placeholder(tf.float32)

Wx_plus_b = tf.nn.dropout(Wx_plus_b, keep_prob)
...
...

sess.run(train_step, feed_dict={xs: X_train, ys: y_train, keep_prob: 0.5})

keep_prob是保留概率，即我们要保留的结果所占比例，它作为一个placeholder，在run时传入，当keep_prob=1的时候，相当于100%保留，也就是dropout没有起作用。