第4章搭建神经网络

4.1神经网络基本概念

前文，笔者带领大家建立了一个线性模型，并通过TensorFlow将其实现，前面只是介绍了TensorFlow的基本概念。本文将介绍神经网络的基本概念，神经网络是一种数学模型，是存在于计算机的神经系统，由大量的神经元相连接并进行计算，在外界信息的基础上，改变内部的结构，常用来对输入和输出间复杂的关系进行建模。神经网络由大量的节点和之间的联系构成，负责传递信息和加工信息，神经元也可以通过训练而被强化。
下图就是一个神经网络系统，它由很多层构成。输入层就是负责接收信息，比如说一只猫的图片。输出层就是计算机对这个输入信息的认知，它是不是猫。隐藏层就是对输入信息的加工处理。

图1
神经网络是如何被训练的，首先它需要很多数据。比如他要判断一张图片是不是猫。就要输入上千万张的带有标签的猫猫狗狗的图片，然后再训练上千万次。
神经网络训练的结果有对的也有错的，如果是错误的结果，将被当做非常宝贵的经验，那么是如何从经验中学习的呢？就是对比正确答案和错误答案之间的区别，然后把这个区别反向的传递回去，对每个相应的神经元进行一点点的改变。那么下一次在训练的时候就可以用已经改进一点点的神经元去得到稍微准确一点的结果。
神经网络是如何训练的呢？每个神经元都有属于它的激活函数，用这些函数给计算机一个刺激行为。
这里写图片描述

图2
在第一次给计算机看猫的图片的时候，只有部分的神经元被激活，被激活的神经元所传递的信息是对输出结果最有价值的信息。如果输出的结果被判定为是狗，也就是说是错误的了，那么就会修改神经元，一些容易被激活的神经元会变得迟钝，另外一些神经元会变得敏感。这样一次次的训练下去，所有神经元的参数都在被改变，它们变得对真正重要的信息更为敏感。
这里写图片描述

图3
好了，总结一下，不管有多少层网络，宏观上看就三层：输入层，隐藏层和输出层；在输入层输入数据，然后数据飞到隐藏层飞到输出层，用梯度下降处理，梯度下降会对几个参数进行更新和完善，更新后的参数再次跑到隐藏层去学习，这样一直循环直到结果收敛。
这里写图片描述

图4 TensorFlow的模型结构
动图链接

4.2搭建神经网络的基本流程

搭建神经网络基本流程：
 训练的数据；
 定义节点准备接收数据；
 定义神经层：隐藏层和预测层
 定义 loss 表达式
 选择 optimizer 使 loss 达到最小
本文将在前文的基础上，添加神经层，在具体实现基本流程之前，我们先明确几个基本概念。
 激励函数
激励函数一般用于神经网络的层与层之间，上一层的输出通过激励函数的转换之后输入到下一层中。神经网络模型是非线性的，如果没有使用激励函数，那么每一层实际上都相当于矩阵相乘。经过非线性的激励函数作用，使得神经网络有了更多的表现力。
那么为何要使用激励函数呢？因为线性函数有一个特点，那就是线性函数的组合还是线性函数，这也就以为这不论你所设计的神经网络有多深，多么复杂，只要里面用到的激励函数是线性函数，那么这些层层之间都是线性函数的一个组合，最终整个网络依然是线性的，可以用一个矩阵来代替，跟只有一层网络是没有区别的，所以线性激励函数的表达能力是有限的，不能描述现实生活中存在的大部分的问题，故我们采用非线性的激励函数。
例如一个神经元对猫的眼睛敏感，那当它看到猫的眼睛的时候，就被激励了，相应的参数就会被调优，它的贡献就会越大。下面是几种常见的激活函数，x轴表示传递过来的值，y轴表示它传递出去的值。

图5激励函数
激励函数在预测层，判断哪些值要被送到预测结果那里：
这里写图片描述

图6
对于激励函数的理解： https://blog.csdn.net/hyman_yx/article/details/51789186

 神经层添加
神经网络有很多层，那么神经层是如何添加的呢？关于更详细的理论请自行查阅吧，笔者接下来直接上代码。
输入参数有 inputs, in_size, out_size, 和 activation_function。具体实现代码如下。

#添加层
def add_layer(inputs, in_size, out_size, activation_function=None):

    #线性模型
    Weights = tf.Variable(tf.random_normal([in_size, out_size]))
    biases = tf.Variable(tf.zeros([1, out_size]) + 0.1)
    Wx_plus_b = tf.matmul(inputs, Weights) + biases

    if activation_function is None:
        outputs = Wx_plus_b
    else:
        outputs = activation_function(Wx_plus_b)

    return outputs

好了，看完整的代码吧。
【代码参看附件test1_build_network.py】

import tensorflow as tf
import numpy as np

#添加层
def add_layer(inputs, in_size, out_size, activation_function=None):

    #线性模型
    Weights = tf.Variable(tf.random_normal([in_size, out_size]))
    biases = tf.Variable(tf.zeros([1, out_size]) + 0.1)
    Wx_plus_b = tf.matmul(inputs, Weights) + biases

    if activation_function is None:
        outputs = Wx_plus_b
    else:
        outputs = activation_function(Wx_plus_b)

    return outputs

#【1】创建原始数据，及要训练的数据
x_data = np.linspace(-1,1,300)[:, np.newaxis]
noise = np.random.normal(0, 0.05, x_data.shape)
y_data = np.square(x_data) - 0.5 + noise

#【2】定义节点准备接收数据
xs = tf.placeholder(tf.float32, [None, 1])
ys = tf.placeholder(tf.float32, [None, 1])

#【3】定义神经层：隐藏层和预测层
#添加隐藏层，输入值是 xs，在隐藏层有 10 个神经元  
l1 = add_layer(xs, 1, 10, activation_function=tf.nn.relu)
#添加输出层，输入值是隐藏层 l1，在预测层输出 1 个结果
prediction = add_layer(l1, 10, 1, activation_function=None)

#【4】定义损失函数，误差的均方差
loss = tf.reduce_mean(tf.reduce_sum(tf.square(ys - prediction),
                     reduction_indices=[1]))
#【5】选择 optimizer 使 loss 达到最小，选择梯度下降的方法训练数据
train_step = tf.train.GradientDescentOptimizer(0.1).minimize(loss)

#【6】初始化数据，tf 的必备步骤，主要声明了变量，就必须初始化才能用
init = tf.initialize_all_variables()

#【7】创建Session会话。启动图
sess = tf.Session()
#上面定义的都没有运算，直到 sess.run 才会开始运算
sess.run(init)

#【8】训练模型的到结果，迭代，反复执行上面的最小化损失函数这一操作，拟合数据
for i in range(1000):
    #train_step 和 loss 都是由 placeholder 定义的运算，所以这里要用 feed 传入参数
    sess.run(train_step, feed_dict={xs: x_data, ys: y_data})
    if i % 50 == 0:
        # to see the step improvement
        print(sess.run(loss, feed_dict={xs: x_data, ys: y_data}))

运行结果：

0.7016641
0.020532325
0.012887432
0.020628596
0.016825143
0.0070344894
0.0058545996
0.0054254485
0.0051840018
0.005022936
0.0049506323
0.005070546
0.006097332
0.011496162
0.010251511
0.0056009768
0.004545993
0.0043214685
0.004266932
0.004239731

在前文中，我们成功搭建了一个线性模型，出来的结果都是数据，是不是很抽象呢，笔者也觉得，那么有没有可视化的工具呢，当然有啦，你首先的安装matplotlib包。笔者用的是Anaconda 集成开发工具，你只需在Anaconda Navigator下的TensorFlow中安装matplotlib包即可。

图7安装matplotlib
matplotlib 可视化构建图形，用散点图描述真实数据之间的关系。显示原始数据的散点图代码如下。

# plot the real data
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.scatter(x_data, y_data)
plt.show()

散点图的结果为：

图8散点图结果
接下来，我们来显示预测数据。每隔50次训练刷新一次图形，用红色、宽度为5的线来显示我们的预测数据和输入之间的关系，并暂停0.1s。

for i in range(1000):
    # training
    sess.run(train_step, feed_dict={xs: x_data, ys: y_data})
    if i % 50 == 0:
        # to visualize the result and improvement
        try:
            ax.lines.remove(lines[0])
        except Exception:
            pass
        prediction_value = sess.run(prediction, feed_dict={xs: x_data})
        # plot the prediction
        lines = ax.plot(x_data, prediction_value, 'r-', lw=5)
        plt.pause(0.1)

完整代码如下：
【代码参考附件test2_plut_result.py】

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt#需要安装才可使用

#添加层
def add_layer(inputs, in_size, out_size, activation_function=None):

    #线性模型
    Weights = tf.Variable(tf.random_normal([in_size, out_size]))
    biases = tf.Variable(tf.zeros([1, out_size]) + 0.1)
    Wx_plus_b = tf.matmul(inputs, Weights) + biases

    if activation_function is None:
        outputs = Wx_plus_b
    else:
        outputs = activation_function(Wx_plus_b)

    return outputs

#【1】创建原始数据，及要训练的数据
x_data = np.linspace(-1,1,300)[:, np.newaxis]
noise = np.random.normal(0, 0.05, x_data.shape)
y_data = np.square(x_data) - 0.5 + noise

#【2】定义节点准备接收数据
xs = tf.placeholder(tf.float32, [None, 1])
ys = tf.placeholder(tf.float32, [None, 1])

#【3】定义神经层：隐藏层和预测层
#添加隐藏层，输入值是 xs，在隐藏层有 10 个神经元  
l1 = add_layer(xs, 1, 10, activation_function=tf.nn.relu)
#添加输出层，输入值是隐藏层 l1，在预测层输出 1 个结果
prediction = add_layer(l1, 10, 1, activation_function=None)

#【4】定义损失函数，误差的均方差
loss = tf.reduce_mean(tf.reduce_sum(tf.square(ys - prediction),
                     reduction_indices=[1]))
#【5】选择 optimizer 使 loss 达到最小，选择梯度下降的方法训练数据
train_step = tf.train.GradientDescentOptimizer(0.1).minimize(loss)

#【6】初始化数据，tf 的必备步骤，主要声明了变量，就必须初始化才能用
init = tf.initialize_all_variables()

#【7】创建Session会话。启动图
sess = tf.Session()
#上面定义的都没有运算，直到 sess.run 才会开始运算
sess.run(init)

# 打印数据
fig = plt.figure()
ax = fig.add_subplot(1,1,1)
ax.scatter(x_data, y_data)
plt.ion()
plt.show()#若预测曲线出不来请将其注释

#【8】训练模型的到结果，迭代，反复执行上面的最小化损失函数这一操作，拟合数据
for i in range(1000):
    #train_step 和 loss 都是由 placeholder 定义的运算，所以这里要用 feed 传入参数
    sess.run(train_step, feed_dict={xs: x_data, ys: y_data})
    if i % 50 == 0:
        # to visualize the result and improvement
        try:
            ax.lines.remove[lines[0]] 
        except Exception:
            pass
        prediction_value = sess.run(prediction, feed_dict={xs: x_data})
        # plot the prediction
        lines = ax.plot(x_data, prediction_value, 'r-', lw=5)
        plt.pause(0.1)

结果如下如所示。

图9
值得注意的是以上是笔者多次试验一个比较好的结果，其中还有拟合曲线出入很大的，如下图。
这里写图片描述

图10
可以看出实际数据和拟合数据差别很大，这就是overfitting的问题，也就是过度拟合问题，在TensorFlow中，有一个很好的工具, 叫做dropout, 只需要给予它一个不被 drop 掉的百分比，就能很好地降低 overfitting。关于Dropout的使用笔者会在后文介绍。

4.3可视化 Tensorboard

Tensorflow 自带 tensorboard ，可以自动显示我们所建造的神经网络流程图，有助于你发现编程中间的问题和疑问。

图11
同时我们也可以展开看每个layer中的一些具体的结构：
这里写图片描述

图12
好了，通过阅读代码和之前的图片我们大概知道了此处是有一个输入层（inputs），一个隐含层（layer），还有一个输出层（output）现在可以看看如何进行可视化。
我们先看完完整代码吧。
【代码参看附件test3_tensorboard.py】

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt#需要安装才可使用

#添加层
def add_layer(inputs, in_size, out_size, activation_function=None):

    #线性模型
    with tf.name_scope('layer'):
        with tf.name_scope('weights'):
            Weights = tf.Variable(tf.random_normal([in_size, out_size]), name='W')
        with tf.name_scope('biases'):
            biases = tf.Variable(tf.zeros([1, out_size]) + 0.1, name='b')
        with tf.name_scope('Wx_plus_b'):
            Wx_plus_b = tf.add(tf.matmul(inputs, Weights), biases)
        if activation_function is None:
            outputs = Wx_plus_b
        else:
            outputs = activation_function(Wx_plus_b, )
        return outputs

#【1】创建原始数据，及要训练的数据
x_data = np.linspace(-1,1,300)[:, np.newaxis]
noise = np.random.normal(0, 0.05, x_data.shape)
y_data = np.square(x_data) - 0.5 + noise

#【2】定义节点，输入网络
with tf.name_scope('inputs'):
    xs = tf.placeholder(tf.float32, [None, 1],name='x_input')
    ys = tf.placeholder(tf.float32, [None, 1],name='y_input')

#【3】定义神经层：隐藏层和预测层
#添加隐藏层，输入值是 xs，在隐藏层有 10 个神经元  
l1 = add_layer(xs, 1, 10, activation_function=tf.nn.relu)
#添加输出层，输入值是隐藏层 l1，在预测层输出 1 个结果
prediction = add_layer(l1, 10, 1, activation_function=None)

#【4】定义损失函数，误差的均方差
with tf.name_scope('loss'):
    loss = tf.reduce_mean(tf.reduce_sum(tf.square(ys - prediction),
                                        reduction_indices=[1]))

#【5】选择 optimizer 使 loss 达到最小，选择梯度下降的方法训练数据
with tf.name_scope('train'):
    train_step = tf.train.GradientDescentOptimizer(0.1).minimize(loss)

#【6】初始化数据，tf 的必备步骤，主要声明了变量，就必须初始化才能用
init = tf.initialize_all_variables()

#【7】创建Session会话。启动图
sess = tf.Session()
#writer = tf.train.SummaryWriter("logs/", sess.graph)#新版的TensorFlow已经弃用
writer = tf.summary.FileWriter("logs/",sess.graph)#加载文件

#上面定义的都没有运算，直到 sess.run 才会开始运算
sess.run(init)

运行成功后会在你存放文件的文件夹下生成下列文件。

图13生成的网络图文件
打开 terminal，进入你存放的文件夹地址上一层，笔者存放的文件夹是logs。因此，要进入logs的上一级文件，运行命令 tensorboard –logdir=’logs/’ 后会返回一个地址。
这里写图片描述

地址为：http://(主机名):6006/。
然后用浏览器打开这个地址。
这里写图片描述

图14
点击 graph 标签栏下就可以看到流程图了：
这里写图片描述

图15
【注1】在Windows平台上运行命令应用tensorboard –logdir=logs/
【注2】笔者在上述代码中已经进行详细注释了，再次就不在赘述了。

好了，前文讲解的是神经网络的图，接下来在上文的基础上继续使用tensorboard。
我们直接上代码吧。
【代码参看附件test4_tensorboard2】

import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt#需要安装才可使用

#添加层
def add_layer(inputs, in_size, out_size, n_layer, activation_function=None):
    # add one more layer and return the output of this layer
    layer_name = 'layer%s' % n_layer
    #线性模型
    with tf.name_scope('layer'):
        with tf.name_scope('weights'):
            Weights = tf.Variable(tf.random_normal([in_size, out_size]), name='W')
            tf.summary.histogram(layer_name + '/weights', Weights)
            #tf.histogram_summary(layer_name + '/weights', Weights)#新版已经废弃
        with tf.name_scope('biases'):
            biases = tf.Variable(tf.zeros([1, out_size]) + 0.1, name='b')
        with tf.name_scope('Wx_plus_b'):
            Wx_plus_b = tf.add(tf.matmul(inputs, Weights), biases)
            tf.summary.histogram(layer_name + '/biases', biases)
        if activation_function is None:
            outputs = Wx_plus_b
        else:
            outputs = activation_function(Wx_plus_b, )
        tf.summary.histogram(layer_name + '/outputs', outputs)
        return outputs

#【1】创建原始数据，及要训练的数据
x_data = np.linspace(-1,1,300)[:, np.newaxis]
noise = np.random.normal(0, 0.05, x_data.shape)
y_data = np.square(x_data) - 0.5 + noise

#【2】定义节点，输入网络
with tf.name_scope('inputs'):
    xs = tf.placeholder(tf.float32, [None, 1],name='x_input')
    ys = tf.placeholder(tf.float32, [None, 1],name='y_input')

#【3】定义神经层：隐藏层和预测层
#添加隐藏层，输入值是 xs，在隐藏层有 10 个神经元  
l1 = add_layer(xs, 1, 10, n_layer=1, activation_function=tf.nn.relu)
#添加输出层，输入值是隐藏层 l1，在预测层输出 1 个结果
prediction = add_layer(l1, 10, 1, n_layer=2, activation_function=None)

#【4】定义损失函数，误差的均方差
with tf.name_scope('loss'):
    loss = tf.reduce_mean(tf.reduce_sum(tf.square(ys - prediction),
                                        reduction_indices=[1]))
    tf.summary.scalar('loss', loss)

#【5】选择 optimizer 使 loss 达到最小，选择梯度下降的方法训练数据
with tf.name_scope('train'):
    train_step = tf.train.GradientDescentOptimizer(0.1).minimize(loss)

#【6】初始化数据，tf 的必备步骤，主要声明了变量，就必须初始化才能用
init = tf.initialize_all_variables()

#【7】创建Session会话。启动图
sess = tf.Session()

#merged = tf.merge_all_summaries()#新版已经废弃
merged = tf.summary.merge_all()
#writer = tf.train.SummaryWriter("logs/", sess.graph)#新版的TensorFlow已经弃用
writer = tf.summary.FileWriter("logs/",sess.graph)#加载文件

#上面定义的都没有运算，直到 sess.run 才会开始运算
sess.run(init)

#【8】训练数据
for i in range(1000):
    sess.run(train_step, feed_dict={xs: x_data, ys: y_data})
    if i % 50 == 0:
        result = sess.run(merged,
                          feed_dict={xs: x_data, ys: y_data})
        writer.add_summary(result, i)

运行成功后，可以通过网页查看。

图16

【注】前文的几个代码请对比查看其中的异同，笔者在代码中也已经详细的注释了，请仔细阅读吧。

4.4保存和加载

训练好了一个神经网络后，可以保存起来下次使用时再次加载。
【代码参看附件test5_save.py】

import tensorflow as tf
import numpy as np

## 【1】保存文件
# remember to define the same dtype and shape when restore
W = tf.Variable([[1,2,3],[3,4,5]], dtype=tf.float32, name='weights')
b = tf.Variable([[1,2,3]], dtype=tf.float32, name='biases')

init= tf.initialize_all_variables()

saver = tf.train.Saver()

# 用 saver 将所有的 variable 保存到定义的路径
with tf.Session() as sess:
   sess.run(init)
   save_path = saver.save(sess, "my_net/save_net.ckpt")
   print("Save to path: ", save_path)

#【2】存储数据
# redefine the same shape and same type for your variables
W = tf.Variable(np.arange(6).reshape((2, 3)), dtype=tf.float32, name="weights")
b = tf.Variable(np.arange(3).reshape((1, 3)), dtype=tf.float32, name="biases")

# not need init step
saver = tf.train.Saver()
# 用 saver 从路径中将 save_net.ckpt 保存的 W 和 b restore 进来
with tf.Session() as sess:
    saver.restore(sess, "my_net/save_net.ckpt")
    print("weights:", sess.run(W))
    print("biases:", sess.run(b))

成功运行后，会生成下文件。

图17
Tensorflow 现在只能保存 variables，还不能保存整个神经网络的框架，所以再使用的时候，需要重新定义框架，然后把 variables 放进去学习。

本章中TensorFlow中API版本容错

#writer = tf.train.SummaryWriter("logs/", sess.graph)#新版的TensorFlow已经弃用
writer = tf.summary.FileWriter("logs/",sess.graph)#加载文件

#tf.histogram_summary(layer_name + '/weights', Weights)#新版已经废弃
tf.summary.histogram(layer_name + '/weights', Weights)

#tf.scalar_summary('loss', loss)#新版已经废弃
tf.summary.scalar('loss', loss)

#merged = tf.merge_all_summaries()#新版已经废弃
merged = tf.summary.merge_all()

本章参考代码

点击进入