神经网络(Neural Network)—由浅入深的学习并搭建一个属于自己的神经网络(干货满满)

近段时间花了大概一周的时间将《Make Your Own Neural Network》这本书仔细阅读了一遍，同时也浏览了它对应的中文版本《Python神经网络编程》。书中有少数几个错误，认真阅读自己能检查出来，这几个错误不会妨碍你学习神经网络。不得不说，这本书写的确实很详细，很适合小白入门Neural Network(简称NN)，我之前也上机器学习的课学过神经网络，但远没有看完这本书理解的深刻。看完了确实收获很大，在此仔细总结一下学到的核心知识，帮助大家更好的理解神经网络。

提示：以下是本篇文章正文内容，写文章实属不易，希望能帮助到各位，转载请附上链接。

一、自然界的神经元

我们先来观察生物大脑中的基本单元——神经元，如图1所示。

神经元将电信号沿着轴突从树突传递到终端，从一个神经元传递到另外一个神经元。关于神经元，我们需要知道以下几点：

第一、不能简单地认为神经元的输出与输入之间为线性函数；

第二、输入的每一个电信号，神经元不一定都会立即反应，只有输入超过了阈值(threshold)，足够接通电路，才会产生输出；

第三、一个神经元可以有多个输入信号和多个输出信号，如图2所示；

第四、树突收集了所有输入电信号并将其组合成一个更强的电信号，如果该信号强大到足以超过阈值，那么神经元就会产生输出，沿着轴突，到达终端，传递给下一个神经元的树突。

二、人造神经网络的构造

我们可以通过模拟生物的神经元建造如图3所示的神经网络。

层1称为输入层，层2称为隐藏层，层3称为输出层，每一层有3个节点，每一个节点相当于一个神经元。小写的w是权重符号，权重 $w_{i,j}$ 表示前一层节点i和后一层节点j的信号相关联，这个权重会减小或者放大传递的信号。在这里，你可能会感到疑惑，为什么把每一个神经元和前后层的所有神经元连接？解释有两点：

第一、全部连接方便计算机编写程序指令计算；

第二、如果实在有冗余的连接，神经网络经过学习之后，会弱化这些不需要的连接，即学习到的权重趋近于0。

关于模拟生物神经元阈值响应的过程，我们貌似可以简单的用一个阶跃函数模拟，如图4所示。

实际上这个函数不光滑，我们可以用图5所示的S型函数(sigmoid function)代替，这个函数看起来比阶跃函数平滑的多，更接近现实，自然界中很少有冰冷尖锐的边缘。

Sigmoid函数，有时候也称为逻辑函数，其函数关系式为 $y=\frac{1}{1+e^{-x}}$ 。观察图像可知，该函数将 $(-\infty ,+\infty )$ 的输入限制在了 $(0,1)$ 之间的输出。

对于所有这些输入，我们只需对它们进行相加，得到最终总和，作为S函数的输入，然后输出结果。这实际上反映了神经元的工作机制。图6说明了这种组合输入，然后对最终输入总和使用阈值的思路。

如果组合信号不够强大，那么S阈值函数的效果是抑制输出信号。如果总和x足够大，S函数的效果就是激发神经元。有趣的是，如果只有其中一个输入足够大，其他输入都很小，那么这也足够激发神经元。更重要的是，如果其中一些输入，单个而言一般大，但不是非常大,这样由于信号的组合足够大,超过阈值，那么神经元也能激发。

三、前向传播信号

图7显示了具有3层，每层具有3个节点的神经网络示例（部分权重值未标出）。

输入矩阵I为输入信号组成的矩阵(注意是一列而非一行)，即：

$\textbf{I}=\begin{bmatrix} 0.9\\0.1 \\ 0.8 \end{bmatrix}$

输入层和隐藏层之间的权重矩阵为(注意其维度，行数是隐藏层神经元的个数，列数是输入层神经元的个数)：

$\textbf{w}_{input_{-}hidden}=\begin{bmatrix} 0.9 &0.3 &0.4 \\ 0.2 &0.8 &0.2 \\ 0.1 &0.5 & 0.6 \end{bmatrix}$

隐藏层和输出层之间的权重矩阵为(注意其维度，行数是输出层神经元的个数，列数是隐藏层神经元的个数)：

$\textbf{w}_{hidden_{-}output}=\begin{bmatrix} 0.3 &0.7 &0.5 \\ 0.6 &0.5 &0.2 \\ 0.8 &0.1 & 0.9 \end{bmatrix}$

计算隐藏层的输入值为：

$\textbf{x}_{hidden}=\textbf{w}_{input_{-}hidden}\cdot \textup{I}=\begin{bmatrix} 0.9 &0.3 &0.4 \\ 0.2 &0.8 &0.2 \\ 0.1 &0.5 & 0.6 \end{bmatrix}\cdot\begin{bmatrix} 0.9\\0.1 \\ 0.8 \end{bmatrix}=\begin{bmatrix} 1.16\\ 0.42 \\ 0.62 \end{bmatrix}$

计算隐藏层的输出为：

$\textbf{o}_{hidden}=sigmoid(\textbf{x}_{hidden})=sigmoid(\begin{bmatrix} 1.16\\ 0.42 \\ 0.62 \end{bmatrix})=\begin{bmatrix} 0.761\\ 0.603 \\ 0.650 \end{bmatrix}$

计算输出层的输入为：

$\textbf{x}_{output}=\textbf{w}_{hidden_{-}output}\cdot \textup{O}_{hidden}=\begin{bmatrix} 0.3 &0.7 &0.5 \\ 0.6 &0.5 &0.2 \\ 0.8 &0.1 & 0.9 \end{bmatrix}\cdot\begin{bmatrix} 0.761\\0.603 \\ 0.650\end{bmatrix}$

$\textbf{x}_{output}=\begin{bmatrix} 0.975\\ 0.888 \\ 1.254 \end{bmatrix}$

计算输出层的输出为：

$\textbf{o}_{output}=sigmoid(\textbf{x}_{output})=sigmoid(\begin{bmatrix} 0.975\\ 0.888 \\ 1.254 \end{bmatrix})=\begin{bmatrix} 0.726\\ 0.708 \\ 0.778 \end{bmatrix}$

整个计算中的信号如图8所示。

四、反向传播误差

具有两个输入节点和两个输出节点的简单网络如图9所示。

我们使用权重的比例来决定调整权重的误差。例如上图我们使用 $e_{1}$ 的一部分来更新 $w_{1,1}$

$\frac{w_{1,1}}{w_{1,1}+w_{2,1}}$

使用 $e_{1}$ 的一部分来更新 $w_{2,1}$

$\frac{w_{2,1}}{w_{1,1}+w_{2,1}}$

具有两个输入节点和两个隐藏节点和两个输出节点的网络如图10所示。

我们用链接在 $w_{1,1}$ 和 $w_{1,2}$ 上的分割误差之和来表示隐藏层第一个节点的误差。即：

$e_{hidden,1}=e_{output,1}*\frac{w_{1,1}}{w_{1,1}+w_{2,1}}+e_{output,2}*\frac{w_{1,2}}{w_{1,2}+w_{2,2}}$

同理，隐藏层第二个节点的误差为：

$e_{hidden,2}=e_{output,1}*\frac{w_{2,1}}{w_{1,1}+w_{2,1}}+e_{output,2}*\frac{w_{2,2}}{w_{1,2}+w_{2,2}}$

写成矩阵的形式如图所示：

$\textup{error}_{hidden}=\begin{bmatrix} \frac{w_{1,1}}{w_{1,1}+w_{2,1}}&\frac{w_{1,2}}{w_{1,2}+w_{2,2}} \\ \frac{w_{2,1}}{w_{1,1}+w_{2,1}}&\frac{w_{2,2}}{w_{1,2}+w_{2,2}} \end{bmatrix}\cdot \begin{bmatrix} e_{output,1}\\ e_{output,2}\end{bmatrix}$

注意这些分数的分母是一种归一化因子，忽略掉这些因子就可以简化为：

$\textup{error}_{hidden}=\begin{bmatrix} w_{1,1} &w_{1,2} \\ w_{2,1} & w_{2,2} \end{bmatrix}\cdot \begin{bmatrix} e_{output,1}\\ e_{output,2}\end{bmatrix}$

因此，使用矩阵的方法来向后传播误差：

$\textup{error}_{hidden}=\textup{w}^{T}_{hidden_{-}output}\cdot \textup{error}_{output}$

注意这里有一个转置，计算反向传播误差的时候权重矩阵要转置一下。

注意3层的神经网络只有隐藏层和输出层有误差，因为输入层仅仅代表数据的输入，并无实质性的计算。

五、利用梯度下降法更新权重

在推导之前，先说明一下，i表示输入层节点的下标，j表示隐藏层节点的下标，k表示输出层节点的下标。

首先，让我们展开误差函数，这是对目标值和实际值之差的平方进行求和，这是针对所有n个输出节点的和。

权重 $w_{j,k}$ 所链接的节点只与 $o_{k}$ 有关， $t_{n}$ 是标签预期值，是常数，所以上式简化为：

运用链式法则，得到：

展开 $o_{k}$ 得到：

我们知道：

所以有：

注意sigmoid函数括号里面还要求导，sigmoid那部分括号里面是个复合函数。

更新后的权重是由刚刚得到误差斜率取反来调整旧的权重而得到的。如果斜率为正，我们希望减小权重，如果斜率为负，我们希望增加权重，因此，我们要对斜率取反。

最终为：

$new\textbf{w}_{j,k}=old\textbf{w}_{j,k}+\Delta \textbf{w}_{j,k}$

其中 $\alpha$ 为学习速度，注意这里也有一个转置。负负得正变成加号了，斜率求导有个负号，对斜率取反有个负号。

当然，同理，我们能得到输入层到隐藏层的权重更新：

$\Delta \textup{w}_{i,j}=\alpha *E_{j}*O_{j}*(1-O_{j})\cdot O_{i}^{T}$

$new\textbf{w}_{i,j}=old\textbf{w}_{i,j}+\Delta \textbf{w}_{i,j}$

六、准备数据

1.输入

仔细观察sigmoid函数，我们会发现，输入太大，输出都会变得很平坦。太平坦的话梯度太小不利于权重更新。输入不要取0，因为计算机在处理非常小或者非常大的数字时会丧失精度。所以输入信号的范围我们一般限制在[0.01,1.00]之间，所以训练神经网络之前要对输入信号做一个预处理。

2.输出

仔细观察sigmoid函数，我们会发现，sigmoid函数的输出范围为（0，1），它取不到0也取不到1。所以我们训练神经网络之前要对标签信号做一个预处理，使其范围落在[0.01,0.99]这个区间上。

3.随机初始权重

我们一般从-1.0~+1.0之间随机的选择初始权重，禁止全部设为0，全部设为0，反向传播时误差会被均分，权重永远得不到更新。我们可以在一个节点传入链接数量平方根倒数的大致范围内随机采样，初始化权重。因此，如果每个节点具有3条传入链接，那么初始权重的范围应该在从 $-\frac{1}{\sqrt{3}}$ 到 $+\frac{1}{\sqrt{3}}$ ，即士0.577之间。我们使用0均值，链接数量平方根倒数为方差的正态分布对权向量进行初始化。

七、编程仿真

我在mnist的数据集上面进行仿真。在Ipython上面用Python语言搭建神经网络仿真。

1.代码

# python notebook for Make Your Own Neural Network
# code for a 3-layer neural network, and code for learning the MNIST dataset
import numpy
# scipy.special for the sigmoid function expit()
import scipy.special
# library for plotting arrays
import matplotlib.pyplot
# ensure the plots are inside this notebook, not an external window % matplotlib inline
# neural network class definition
class neuralNetwork:
   
    # initialise the neural network
    def __init__(self, inputnodes, hiddennodes, outputnodes,learningrate):
        # set number of nodes in each input, hidden, output layer
        self.inodes = inputnodes
        self.hnodes = hiddennodes
        self.onodes = outputnodes
       
        # link weight matrices, wih and who
        # weights inside the arrays are w_i_j, where link is from node i to node j in the next layer
        self.wih = numpy.random.normal(0.0, pow(self.hnodes, -0.5), (self.hnodes, self.inodes))
        self.who = numpy.random.normal(0.0, pow(self.onodes, -0.5), (self.onodes, self.hnodes))
        # learning rate
        self.lr = learningrate
       
        # activation function is the sigmoid function
        self.activation_function = lambda x: scipy.special.expit(x)
       
        pass
   
    # train the neural network
    def train(self, inputs_list, targets_list):
        # convert inputs list to 2d array
        inputs = numpy.array(inputs_list, ndmin=2).T
        targets = numpy.array(targets_list, ndmin=2).T
       
        # calculate signals into hidden layer
        hidden_inputs = numpy.dot(self.wih, inputs)
        # calculate the signals emerging from hidden layer
        hidden_outputs = self.activation_function(hidden_inputs)
       
        # calculate signals into final output layer
        final_inputs = numpy.dot(self.who, hidden_outputs)
        # calculate the signals emerging from final output layer
        final_outputs = self.activation_function(final_inputs)
       
        # output layer error is the (target - actual)
        output_errors = targets - final_outputs
        # hidden layer error is the output_errors, split by weights, recombined at hidden nodes
        hidden_errors = numpy.dot(self.who.T, output_errors)
       
        # update the weights for the links between the hidden and output layers
        self.who += self.lr * numpy.dot((output_errors * final_outputs * (1.0 - final_outputs)), numpy.transpose(hidden_outputs))
       
        # update the weights for the links between the input and hidden layers
        self.wih += self.lr * numpy.dot((hidden_errors * hidden_outputs * (1.0 - hidden_outputs)), numpy.transpose(inputs))
       
        pass
   
    # query the neural network
    def query(self, inputs_list):
        # convert inputs list to 2d array
        inputs = numpy.array(inputs_list, ndmin=2).T
       
        # calculate signals into hidden layer
        hidden_inputs = numpy.dot(self.wih, inputs)
        # calculate the signals emerging from hidden layer
        hidden_outputs = self.activation_function(hidden_inputs)
       
        # calculate signals into final output layer
        final_inputs = numpy.dot(self.who, hidden_outputs)
        # calculate the signals emerging from final output layer
        final_outputs = self.activation_function(final_inputs)
       
        return final_outputs
# number of input, hidden and output nodes
input_nodes = 784
hidden_nodes = 200
output_nodes = 10
# learning rate is 0.1
learning_rate = 0.1
# create instance of neural network
n = neuralNetwork(input_nodes,hidden_nodes,output_nodes,learning_rate)
# load the mnist training data CSV file into a list
training_data_file = open("mnist_train.csv", 'r')
training_data_list = training_data_file.readlines()
training_data_file.close()
# epochs is the number of times the training data set is used for training
epochs = 5
for e in range(epochs):
    # go through all records in the training data set
    for record in training_data_list:
        # split the record by the ',' commas
        all_values = record.split(',')
        # scale and shift the inputs
        inputs = (numpy.asfarray(all_values[1:]) / 255.0 * 0.99) + 0.01
        # create the target output values (all 0.01, except the desired label which is 0.99)
        targets = numpy.zeros(output_nodes) + 0.01
        # all_values[0] is the target label for this record
        targets[int(all_values[0])] = 0.99
        n.train(inputs, targets)
        pass
    pass

test_data_file = open("mnist_test.csv", 'r')
test_data_list = test_data_file.readlines()
test_data_file.close()

# test the neural network
# scorecard for how well the network performs, initially empty
scorecard = []
# go through all the records in the test data set
for record in test_data_list:
    # split the record by the ',' commas
    all_values = record.split(',')
    # correct answer is first value
    correct_label = int(all_values[0])
    #print(correct_label, "correct label")
    # scale and shift the inputs
    inputs = (numpy.asfarray(all_values[1:]) / 255.0 * 0.99) + 0.01
    # query the network
    outputs = n.query(inputs)
    # the index of the highest value corresponds to the label
    label = numpy.argmax(outputs)
    #print(label, "network's answer")
    # append correct or incorrect to list
    if (label == correct_label):
        # network's answer matches correct answer, add 1 to scorecard
        scorecard.append(1)
    else:
        # network's answer doesn't match correct answer, add 0 to scorecard
        scorecard.append(0)
        pass
   
    pass

# calculate the performance score, the fraction of correct answers
scorecard_array = numpy.asarray(scorecard)
print ("performance = ", scorecard_array.sum() / scorecard_array.size)

2.仿真结果

用了mnist的60000个样本对神经网络进行训练，10000个样本对训练后的神经网络进行测试，识别准确率为97.39%。

总结

以上就是此次要分享的全部内容啦，本文详细介绍了神经网络的基本原理及训练仿真。希望对各位小伙伴有所帮助。