Neural Network Notes 1-Three-Layer BP Neural Network

Introduction to the Properties of Neural Networks

The neural network uses several layers to establish connections, just like a black box. For example, there is a machine that enters the black box with peeled apples, and the output becomes a peeled apple, and the peeled bananas automatically become peeled bananas. Then this black box is a skinned "network". We also have a similar neural network in the code. This neural network is similar to the neurons in our brain. The neural network contains many layers, and each layer has many neurons. The input of a neuron may be related to the output of many previous neurons. The closeness of this connection can be represented by a weight value. All the weight values ​​between each two layers can form a matrix, called the transmission matrix, which establishes the connection between the two layers of neurons.
A neural network consists of multiple layers: input layer, hidden layer, output layer. Generally speaking, the data between them is transmitted sequentially.
For example, we hope to have a neural network, input a picture, and the neural network can output what is written on the picture. Then this neural network is the recognition network we are looking for. We can convert the picture into a vector, and use each element of the vector as the input of different neurons in the input layer, so that under the action of the transmission matrix of the input layer and the hidden layer, the data of the picture is transformed to the hidden layer. In the same way, the hidden layer applies the data to different neurons in the output layer through the transmission matrix of the hidden layer and the output layer, so that the output we expect is what is written on the picture.
The following will elaborate on the data transmission process of the neural network and how to construct the expected neural network.

The number of layers in the input layer should be the same as the input, and the number of layers in the output layer should be the same as the number of expected outputs. In theory, multiple hidden layers can be set, and there is no limit to the number of units. But this article only discusses the three-layer neural network. The three-layer BP neural network can approximate any continuous function, which is simple and easy to implement and has good reliability, so only one hidden layer is selected.
Let the number of hidden layer units be ppp , the number of input layer units ismmm , the number of output layer units isnnn , the number of hidden layers is not as many as possible. When using the neural network to approximate the function, the accuracy is guaranteed to be high. There is an empirical formula:
p = 5 + m + n ± 5 p=5+\sqrt{m+n}\pm 5p=5+m+n ±5

insert image description here
In order to facilitate the unified expression, we use 1, 2, and 3 to represent the input layer, hidden layer, and output layer respectively.
Let k kThe input of the k layer is the vectorrk \mathbf{r}^{k}rk , theiiThe value of the i unit isrik r_{i}^{k}rik
kkth _The output of the k layer is the vectorak \mathbf{a}^{k}ak , theiiThe value of the i unitis aik a_{i}^{k}aik
kkth _k layeriii output unit tok+1 k+1k+1st floorjjThe transfer constant of j input units iswi , jk w_{i,j}^{k}wi,jk,构成矩阵 W k = [ w i , j k ] T \mathbf{W}^{k}=[ w_{i,j}^{k} ]^{T} Wk=[wi,jk]T,其中 i i i 0 0 0开始,有一个常数项输入,第一列就是常数项偏置列。
注意矩阵 W k \mathbf{W}^{k} Wk是按照角标转置后排列的。

信息正向传输

传输过程可以写成如下表达:
r j k + 1 = ∑ i = 0 w i , j k a i k r_{j}^{k+1} =\sum_{i=0} w_{i,j}^{k} a_{i}^{k} rjk+1=i=0wi,jkaik

这个式子的意义是某一层的某一个单元与前一层的所有单元都有关,而其中的权重值为 w i , j k w_{i,j}^{k} wi,jk
矩阵写法为:
r k + 1 = W k [ 1 . . . a k ] \mathbf{r} ^{k+1}=\mathbf{W} ^{k} \begin{bmatrix} 1\\ ...\\ \mathbf{a} ^{k} \end{bmatrix} rk+1=Wk1...ak

where a 0 k = 1 a_{0}^{k}=1a0k=1 This item indicates that the transmission matrix has a column as a constant term bias as an input to a certain unit.
There is a transfer relationship between the input and output of each layer:
use the kernel function
g ( x ) = 1 1 + e − xg(x)=\frac{1}{1+e^{-x} }g(x)=1+ex1

This function is a LOGISTIC function, which has differential properties:
g ′ ( x ) = g ( x ) ( 1 − g ( x ) ) g'(x)=g(x)(1-g(x))g(x)=g(x)(1g(x))

This function is that each unit performs a normalization process on the input data, which makes the output of each layer ( 0 , 1 ) (0,1)(0,1 ) the probability number. So
ak = g ( rk ) \mathbf{a} ^{k} =g(\mathbf{r} ^{k} )ak=g(rk)

Let the input vector be x = [ x 1 , x 2 , . . . , xm ] T \mathbf{x} =[x_{1} ,x_{2} ,...,x_{m} ]^{T}x=[x1,x2,...,xm]T,输出向量为y = [ and 1 , y 2 , . . . , yn ] T \mathbf{y} =[y_{1} ,y_{2} ,...,y_{n} ]^{T}y=[y1,y2,...,yn]The T
output vector hasthe same order as the output of the last layer
r 1 = x \mathbf{r}^{1}=\mathbf{x}r1=x

Our goal is to pass the output of the neural network a 3 \mathbf{a} ^{3}a3 to approximate the outputy \mathbf{y}y

Obtaining the expected neural network

误差反向更新(输出层→隐藏层)

定义损失函数
L = ∥ y − a 3 ∥ 2 = ∑ i ( y i − a i 3 ) 2 L=\left \| \mathbf{y}- \mathbf{a} ^{3} \right \| ^{2} =\sum_{i} (y_{i} -a_{i}^{3} )^{2} L=ya32=i(yiai3)2
这个函数就表示训练前的网络在对输入数据处理得到的输出与期望输出的偏离程度。我们希望这个损失函数越小越好。因此理论上我们想逼近损失函数的极小值(也就是0)。损失函数内部有很多参数,我们需要更新的参数就是 w i , j k w_{i,j}^{k} wi,jk
这就需要梯度下降法来不断更新这些权重值。
梯度下降法可以参考我之前写的文章:梯度下降法线性回归模拟:https://blog.csdn.net/m0_53253879/article/details/123811100?spm=1001.2014.3001.5501

设迭代次数为 l l l , by the gradient descent method formula
wi , jk ← wi , jk − α ∂ L ∂ wi , jk w_{i,j}^{k} ←w_{i,j}^{k} -\alpha \frac{\partial L}{\partial w_{i,j}^{k}}wi,jkwi,jkawi,jkL

update first 2 22nd Floor to 3rd3Three- layerwi , j 2 w_{i,j}^{2}wi,j2
∂ L ∂ w α , β 2 = ∂ ∑ i ( yi − ai 3 ) 2 ∂ w α , β 2 = ∂ ∑ i ( yi − g ( ri 3 ) ) 2 ∂ w α , β 2 { \frac{\partial L}{\partial w_{\alpha, \beta }^{2}}= \sum_{i} (y_{i} -a_{i}^{3} )^{2}} _{\alpha, \beta}^{2}}= \frac{\partial \sum_{i} (y_{i} -g(r_{i}^{3}) )^{2}}{\partial w_{\alpha, \beta}^{2}}}wa , b2L=wa , b2i(yiai3)2=wa , b2i(yig(ri3))2
= ∂ ∑ i ( yi − g ( ∑ j = 0 wj , i 2 aj 2 ) ) 2 ∂ w α , β 2 = ∂ ( y α − g ( ∑ j = 0 wj , β 2 aj 2 ) ) 2 ∂ w α , β 2 {= \frac{\partial \sum_{i} (y_{i} -g(\sum_{j = 0} w_{j,i}^{2} a_{j) }^{2}) )^{2}}{\partial w_{\alpha, \beta}^{2}}= \frac{\partial(y_{\alpha } -g(\sum_{j = 0} w_{j ,\beta}^{2} a_{j}^{2}) )^{2}}{\partial w_{\alpha, \beta}^{2}}=wa , b2i(yig(j=0wj,i2aj2))2=wa , b2(yag(j=0wj , b2aj2))2
= − 2 ( y α − g ( ∑ j = 0 w j , β 2 a j 2 ) ) g ′ ( ∑ j = 0 w α , j 2 a j 2 ) ) a α 2 \small =-2(y_{\alpha } -g(\sum_{j = 0} w_{j ,\beta}^{2} a_{j}^{2}) )g'(\sum_{j = 0} w_{\alpha ,j}^{2} a_{j}^{2}) )a_{\alpha }^{2} =2(yag(j=0wj , b2aj2))g(j=0wa , j2aj2))aa2
= − 2 ( y α − g ( ∑ j = 0 w j , β 2 a j 2 ) ) g ( ∑ j = 0 w j , β 2 a j 2 ) ) ( 1 − g ( ∑ j = 0 w j , β 2 a j 2 ) ) ) a α 2 \small =-2(y_{\alpha } -g(\sum_{j = 0} w_{j ,\beta}^{2} a_{j}^{2}) )g(\sum_{j = 0} w_{j ,\beta}^{2} a_{j}^{2}) )(1-g(\sum_{j = 0} w_{j ,\beta}^{2} a_{j}^{2}) ))a_{\alpha }^{2} =2(yag(j=0wj , b2aj2))g(j=0wj , b2aj2))(1g(j=0wj , b2aj2)))aa2

因此:
− 1 2 ∂ L ∂ w i , j 2 = ( y j − a j 3 ) a j 3 ( 1 − a j 3 ) a i 2 {\normalsize -\frac{1}{2} \frac{\partial L}{\partial w_{i, j }^{2}}=(y_{j } -a_{j}^{3})a_{j}^{3}(1-a_{j}^{3}) a_{i }^{2}} 21wi,j2L=(yjaj3)aj3(1aj3)ai2

记:
Δ i 2 = ( yi − ai 3 ) ai 3 ( 1 − ai 3 ) Δ k = [ Δ 0 k , Δ 1 k , . . . Δ nk ] TA 3 = diag ( a 1 3 , a 2 3 , . . . , an 3 ) a ~ 2 = [ 1 . . . a 2] \ begin {matrix} \ delta _ {i}^{2} = (y_ {i} -a_ {i}^{3}) a_ {i}^{3} (1 -a_ {i}^{3}) \ \ \ mathbf {\ delta}^{k} {k}, \ delta _ {1}^{k}, ... \ delta _ {n}^{k}]^{t} \\ \ mathbf {a}^{3} = diag (a_ {1}^{3}, a_ {2}^{3}, ..., a_ {n Ilde {\ mathbf {a}} ^{2} = \ begin {bmatrix} 1 \\ ... \\ \ mathbf {a} ^{2} \ end {bmatrix} \ end {matrix}Di2=(yiai3)ai3(1ai3)Dk=[ D0k,D1k,. . . Dnk]TA3=d i a g ( a13,a23,...,an3)a~2=1...a2
故有:
− 1 2 ∂ L ∂ w i , j 2 = Δ j 2 a i 2 -\frac{1}{2} \frac{\partial L}{\partial w_{i, j }^{2}}=\Delta _{j}^{2}a_{i }^{2} 21wi,j2L=Dj2ai2

Δ 2 = A 3 ( I − A 3 ) ( y − a 3 ) w i , j 2 ← w i , j 2 − α Δ j 2 a i 2 \begin{matrix} \mathbf{\Delta }^{2}=\mathbf{A} ^{3}(\mathbf{I}-\mathbf{A} ^{3})(\mathbf{y}- \mathbf{a} ^{3} ) \\ w_{i,j}^{2} ←w_{i,j}^{2} -\alpha \Delta _{j}^{2}a_{i }^{2} \end{matrix} D2=A3(IA3)(ya3)wi,j2wi,j2a Dj2ai2

Written in matrix form,
W 2 ← W 2 − α Δ 2 ( a ~ 2 ) T \mathbf{W} ^{2} ←\mathbf{W} ^{2} -\alpha \mathbf{\Delta }^{2}(\tilde{\mathbf{a}} ^{2})^{T}W2W2a D2(a~2)T

It should be noted here that − 1 2 ∂ L ∂ wi , j 2 -\frac{1}{2} \frac{\partial L}{\partial w_{i, j }^{2}}21wi,j2L− 1 2 -\frac{1}{2} in21The learned rate α \alphaα absorbs because the learning rate is a variable parameter

Error reverse update (hidden layer → input layer)

− 1 2 ∂ L ∂ w α , β 1 = ∑ i ( y j − a j 3 ) a j 3 ( 1 − a j 3 ) a i 2 w j , i 2 ∑ j ∂ a j 2 ∂ w α , β 1 -\frac{1}{2} \frac{\partial L}{\partial w_{\alpha ,\beta }^{1}}= \sum_{i} (y_{j } -a_{j}^{3})a_{j}^{3}(1-a_{j}^{3}) a_{i }^{2}w_{j,i}^{2}\sum_{j}\frac{\partial a_{j}^{2} }{\partial w_{\alpha ,\beta }^{1} } 21wa , b1L=i(yjaj3)aj3(1aj3)ai2wj,i2jwa , b1aj2

The above derivation formula is quite complicated. According to the chain rule, we only give the following results: − 1 2 ∂
L ∂ wi , j 1 = aj 2 ( 1 − aj 2 ) ai 1 ∑ kwjk 2 Δ k 2 -\frac{1}{2} \frac{\partial L}{\partial w_{i, j }^{1}}=a_ {j}^{2}(1-a_{j}^{2}) a_{i }^{1}\sum_{k} w_{jk}^{2} \Delta _{k}^{2}21wi,j1L=aj2(1aj2)ai1kwjk2Dk2


Δ j 1 = a j 2 ( 1 − a j 2 ) ∑ k w j k 2 Δ k 2 A 2 = d i a g ( a 1 2 , a 2 2 , . . . , a n 2 ) a ~ 1 = [ 1 . . . a 1 ] \begin{matrix} \Delta _{j}^{1} =a_{j}^{2}(1-a_{j}^{2})\sum_{k} w_{jk}^{2} \Delta _{k}^{2} \\ \mathbf{A} ^{2} =diag(a_{1}^{2} ,a_{2}^{2},...,a_{n}^{2}) \end{matrix}\\ \tilde{\mathbf{a}} ^{1} = \begin{bmatrix} 1\\ ...\\ \mathbf{a} ^{1} \end{bmatrix} Dj1=aj2(1aj2)kwjk2Dk2A2=d i a g ( a12,a22,...,an2)a~1=1...a1

In the above formulas, W 2 \mathbf{W} ^{2}WThe constant bias column of 2 disappears in the derivative,We define the matrix W ~ 2 \tilde{\mathbf{W}} ^{2}W~2W 2 \mathbf{W} ^{2}W2 Delete the offset column (first column) and take the transposed matrix.

故有:
Δ 1 = A 2 ( I − A 2 ) W ~ 2 Δ 2 \mathbf{\Delta } ^{1} =\mathbf{A} ^{2} (\mathbf{I}- \mathbf{A} ^{2})\tilde{\mathbf{W}} ^{2} \mathbf{\Delta }^{2}D1=A2(IA2)W~2 D2

W 1 ← W 1 − α Δ 1 ( a ~ 1 ) T \mathbf{W} ^{1} ←\mathbf{W} ^{1} -\alpha \mathbf{\Delta }^{1}(\tilde{\mathbf{a}} ^{1})^{T}W1W1a D1(a~1)T

After continuous updating, wi , jk w_{i,j}^{k}wi,jkGradually become the network we expect, and then for each input, we can achieve the output we expect.
but here需要强调的是,该神经网络不是对某一个样本输入一直进行训练达到损失函数的收敛,而是大量的数据作为输入样本依次通过神经网络,每个样本输入都会对神经网络进行一次训练。在大量样本都对该神经网络进行训练之后,神经网络对于某个样本的输出就能有一个大体的判断。所有样本也可以对神经网络进行多次训练,但是损失函数不会收敛到0,而是会在一定范围内进行波动。

伪代码实现

训练函数

从文件获取 x = [ x 1 , x 2 , . . . , x m ] T \mathbf{x} =[x_{1} ,x_{2} ,...,x_{m} ]^{T} x=[x1,x2,...,xm]T
从文件获取 y = [ y 1 , y 2 , . . . , y n ] T \mathbf{y} =[y_{1} ,y_{2} ,...,y_{n} ]^{T} y=[y1,y2,...,yn]T
中间隐藏层层数 p = 5 + m + n p=5+\sqrt{m+n} p=5+m+n
第一层输入 r 1 = x ∈ R m × 1 \mathbf{r}^{1}=\mathbf{x}\in R^{m\times 1} r1=xRm × 1
first layer outputa 1 ∈ R m × 1 \mathbf{a}^{1}\in R^{m\times 1}a1Rm × 1
second layer inputr 2 ∈ R p × 1 \mathbf{r}^{2}\in R^{p\times 1}r2Rp × 1
second layer outputa 2 ∈ R p × 1 \mathbf{a}^{2}\in R^{p\times 1}a2Rp × 1
third layer inputr 3 ∈ R n × 1 \mathbf{r}^{3}\in R^{n\times 1}r3Rn × 1
The third layer outputsa 3 ∈ R n × 1 \mathbf{a}^{3}\in R^{n\times 1}a3Rn × 1
transfer matrix from the first layer to the second layerW 1 ∈ R p × ( m + 1 ) \mathbf{W} ^{1}\in R^{p\times (m+1)}W1Rp × ( m + 1 )
second layer to third layer transfer matrixW 2 ∈ R n × ( p + 1 ) \mathbf{W} ^{2}\in R^{n\times (p+1)}W2Rn × ( p + 1 )
augmented vectora ~ 1 = [ 1 . . . a 1 ] ∈ R ( m + 1 ) × 1 \tilde{\mathbf{a}} ^{1} = \begin{bmatrix} 1\\ ...\\ \mathbf{a} ^{1} \end{bmatrix}\in R^{(m+1)\times 1}a~1=1...a1R( m + 1 ) × 1 augmented
vectora ~ 2 = [ 1 . . . a 2 ] ∈ R ( p + 1 ) × 1 \tilde{\mathbf{a}} ^{2} = \begin{bmatrix} 1\\ ...\\ \mathbf{a} ^{2} \end{bmatrix}\in R^{(p+1)\times 1}a~2=1...a2R(p+1)×1


Define the function g ( x ) = 1 1 + e − xg(x)=\frac{1}{1+e^{-x} }g(x)=1+ex1
Define the learning parameter α = \alpha=a=
Define the number of iterationsk = 0 k=0k=0
loss function assign initial valueL ( 0 ) = ∥ y − ( a 3 ) ∥ 2 L^{(0)}=\left \| \mathbf{y}- (\mathbf{a} ^{3}) \right \| ^{2}L(0)=y(a3)2
Assign the initial value of the transfer matrix from the first layer to the second layer( W 1 ) (\mathbf{W} ^{1})(W1 )
Assign the initial value of the transfer matrix from the second layer to the third layer( W 2 ) (\mathbf{W} ^{2})(W2)


(正向更新)
a 1 = g ( r 1 ) \mathbf{a} ^{1} =g(\mathbf{r} ^{1} ) a1=g(r1)
( r 2 ) ( 0 ) = ( W 1 ) ( 0 ) a ~ 1 (\mathbf{r} ^{2})^{(0)}=(\mathbf{W} ^{1})^{(0)}\tilde{\mathbf{a}} ^{1} (r2)(0)=(W1)(0)a~1
( a 2 ) ( 0 ) = g ( r 2 ) ( 0 ) (\mathbf{a} ^{2})^{(0)} =g(\mathbf{r} ^{2} )^{(0)} (a2)(0)=g(r2)(0)
( a ~ 2 ) ( 0 ) = [ 1 . . . ( a 2 ) ( 0 ) ] (\tilde{\mathbf{a}} ^{2})^{(0)} = \begin{bmatrix} 1\\ ...\\ \mathbf(\mathbf{a} ^{2})^{(0)} \end{bmatrix} (a~2)(0)=1...(a2)(0)
( r 3 ) ( 0 ) = ( W 2 ) ( 0 ) a ~ 2 (\mathbf{r} ^{3})^{(0)}=(\mathbf{W} ^{2})^{(0)}\tilde{\mathbf{a}} ^{2} (r3)(0)=(W2)(0)a~2
( a 3 ) ( 0 ) = g ( r 3 ) ( 0 ) (\mathbf{a} ^{3})^{(0)} =g(\mathbf{r} ^{3} )^{(0)} (a3)(0)=g(r3)(0)
( a ~ 3 ) ( 0 ) = [ 1 . . . ( a 3 ) ( 0 ) ] (\tilde{\mathbf{a}} ^{3})^{(0)} = \begin{bmatrix} 1\\ ...\\ \mathbf(\mathbf{a} ^{3})^{(0)} \end{bmatrix} (a~3)(0)=1...(a3)(0)
( A 2 ) ( 0 ) = d i a g ( ( a ~ 2 ) ( 0 ) ) (\mathbf{A} ^{2})^{(0)} =diag((\tilde{\mathbf{a}} ^{2})^{(0)}) (A2)(0)=diag((a~2)(0))
( A 3 ) ( 0 ) = d i a g ( ( a ~ 3 ) ( 0 ) ) (\mathbf{A} ^{3})^{(0)} =diag((\tilde{\mathbf{a}} ^{3})^{(0)}) (A3)(0)=diag((a~3)( 0 ) )
( Δ 2 ) ( 0 ) = ( A 3 ) ( 0 ) ( I − ( A 3 ) ( 0 ) ) ( y − ( a 3 ) ( 0 ) ) (\mathbf{\Delta }^{2})^{(0)}=(\mathbf{A} ^{3})^{(0)}(\mathbf{I}-(\mathbf{A} ^{3} )^{(0)})(\mathbf{y}- (\mathbf{a} ^{3})^{(0)} )( D2)(0)=(A3)(0)(I(A3)(0))(y(a3)( 0 ) )
( Δ 1 ) ( 0 ) = ( A 2 ) ( 0 ) ( I − ( A 2 ) ( 0 ) ) W ~ 2 ( Δ 2 ) ( 0 ) (\mathbf{\Delta }^{1})^{(0)}=(\mathbf{A} ^{2})^{(0)} (\mathbf{I}- (\mathbf{A} ^{2}) ^{(0)})\tilde{\mathbf{W}} ^{2}(\mathbf{\Delta }^{2})^{(0)}( D1)(0)=(A2)(0)(I(A2)(0))W~2 (D2)( 0 )
Initial value of loss functionL ( 0 ) = ∥ y − ( a 3 ) ( 0 ) ∥ 2 L^{(0)}=\left \| \mathbf{y}- (\mathbf{a} ^{3})^{(0)} \right \| ^{2}L(0)=y(a3)(0)2



循环,按样本数对每个样本执行如下操作
(误差反向更新)
W 2 ← W 2 − α Δ 2 ( a ~ 2 ) T \mathbf{W} ^{2} ←\mathbf{W} ^{2} -\alpha \mathbf{\Delta }^{2}(\tilde{\mathbf{a}} ^{2})^{T} W2W2αΔ2(a~2)T
W 1 ← W 1 − α Δ 1 ( a ~ 1 ) T \mathbf{W} ^{1} ←\mathbf{W} ^{1} -\alpha \mathbf{\Delta }^{1}(\tilde{\mathbf{a}} ^{1})^{T} W1W1αΔ1(a~1)T
(正向传递)
r 2 = W 1 a ~ 1 \mathbf{r} ^{2}=\mathbf{W} ^{1}\tilde{\mathbf{a}} ^{1} r2=W1a~1
a 2 = g ( r 2 ) \mathbf{a} ^{2} =g(\mathbf{r} ^{2} ) a2=g(r2)
a ~ 2 = [ 1 . . . a 2 ] \tilde{\mathbf{a}} ^{2}= \begin{bmatrix} 1\\ ...\\ \mathbf{a} ^{2} \end{bmatrix} a~2=1...a2
r 3 = W 2 a ~ 2 \mathbf{r} ^{3}=\mathbf{W} ^{2}\tilde{\mathbf{a}} ^{2}r3=W2a~2
a 3 = g ( r 3 ) \mathbf{a} ^{3} =g(\mathbf{r} ^{3} ) a3=g(r3 )
a ~ 3 = . . . . . . . . a 3 ] \tilde{\mathbf{a}}^{3} = \begin{bmatrix}1\\ ...\\\mathbf{a}^{3}\end{bmatrix}a~3=1...a3
A 2 = d i a g ( a 2 ) \mathbf{A} ^{2} =diag(\mathbf{a} ^{2}) A2=d i a g ( a2)
A 3 = d i a g ( a 3 ) \mathbf{A} ^{3} =diag(\mathbf{a} ^{3}) A3=d i a g ( a3 )
Δ 2 = A 3 ( I − A 3 ) ( y − a 3 ) \mathbf{\Delta }^{2}=\mathbf{A} ^{3}(\mathbf{I}-\mathbf{A} ^{3})(\mathbf{y}- \mathbf{a} ^{3})D2=A3(IA3)(ya3 )
Δ 1 = A 2 ( I − A 2 ) W ~ 2 Δ 2 \mathbf{\Delta }^{1}=\mathbf{A} ^{2}(\mathbf{I}- \mathbf{A} ^{2})\tilde{\mathbf{W}} ^{2} \mathbf{\Delta }^{2}D1=A2(IA2)W~2 D2
L = ∥ y − a 3 ∥ 2 L=\left \| \mathbf{y}- \mathbf{a} ^{3} \right \| ^{2}L=ya32


Test function, predicted by the trained neural network

a 1 = g ( x ) \mathbf{a} ^{1} =g(\mathbf{x} ) a1=g ( x )
a ~ 1 = . . . . . . . . a 1 ] \tilde{\mathbf{a}} ^{1} = \begin{bmatrix}1\\ ...\\ \mathbf{a}^{1} \end{bmatrix}a~1=1...a1
a 2 = g ( W 1 a ~ 1 ) \mathbf{a} ^{2}=g(\mathbf{W} ^{1} \tilde{\mathbf{a}} ^{1})a2=g(W1a~1 )
a ~ 2 = . . . . . . . . a 2 ] \tilde{\mathbf{a}}^{2} = \begin{bmatrix}1\\ ...\\\mathbf{a}^{2}\end{bmatrix}a~2=1...a2
y = g ( W 2 a ~ 2 ) \mathbf{y}=g(\mathbf{W} ^{2} \tilde{\mathbf{a}} ^{2})y=g(W2a~2 )
(Equation:y = g ( W 2 [ 1 . . . g ( W 1 [ 1 . . . g ( x ) ] ) ] \mathbf{y}=g(\mathbf{W} ^{2} \begin{bmatrix} 1\\ ...\\ g(\mathbf{W} ^{1} \begin{bmatrix} 1\\ \ ...\\ \mathbf g(\mathbf{x} ) \end{bmatrix}) \end{bmatrix}y=g(W21...g(W11...g(x)))
definition y \mathbf{y}y

Example of handwritten digit recognition

The download link for the training set of handwritten digits is as follows:
Handwritten digit recognition training set: Baidu network disk sharing https://pan.baidu.com/s/1A61uqPB_TTfyJ8sT4OqKIw?pwd=8f7a
Each picture is a 28 × 28 28\times 2828×2 8 -pixel images,
insert image description here

To use this picture as input, you need to import the picture into Python and form a normalized grayscale matrix to
create a new file NeuralClass.py, and write the functions we need in this file.
Put the dataset into the Python folder, and the function to read the data is as follows:

from PIL import Image

# 输入图片,生成归一化灰度矩阵(黑色为1)
def get_gray_mt(pic_root):
    img = Image.open(pic_root)                  # 读取图片
    img = img.convert('L')                      # 灰度化
    cols, rows = img.size                       # 图像大小

    Value = [[0] * cols for i in range(rows)]   # 创建一个大小与图片相同的二维数组

    for x in range(0, rows):
        for y in range(0, cols):
            img_array = np.array(img)
            v = img_array[x, y]                 # 获取该点像素值
            Value[x][y] = 255 - v               # 取反色,存入数组
    Value = np.array(Value)/255
    return Value

This function can convert the input image address into a matrix. However, the input of the neural network must be a column vector, so the following function is needed to convert the matrix into a column vector. The reading method is to read by row, and connect each row of the input matrix end to end to form a matrix.

from math import *
import numpy as np

# 将矩阵转化为列向量
def vector_c(A):
    cols, rows = A.shape
    vec = A.reshape(1, cols * rows)
    return np.array(vec[0])

The matrix functions provided by the numpy library provided by Python are limited, so we need to manually write some functions for specific needs: In order to realize W 1 \mathbf{W} ^{1}
with biased itemsW1W 2 \mathbf{W} ^{2}W2 matrix multiplication, the output vector of each layer needs to provide a function to add 1 to the vector head to form an augmented vector:

# 求增广向量,头部添加1
def enlarge_vec(x):
    a = np.ones((1, len(x) + 1))
    for i in range(len(x)):
        a[0][i+1] = x[i]
    return a
# 将向量转化成对角矩阵
def vec_diag(x):
    A = np.eye(len(x))
    for i in range(len(x)):
        A[i][i] = x[i]
    return A

Matrix W ~ 2 \tilde{\mathbf{W}} ^{2}W~2W 2 \mathbf{W} ^{2}W2 Delete the offset column (the first column) and take the transposed matrix, so we write the following function to delete a column of the matrix.

# 删除矩阵某一列的函数(输入矩阵, 需要删除的列数)
def dlt_c(A, c0):
    r, c = A.shape
    B = np.zeros((r, c - 1))
    for i in range(r):
        for j in range(c-1):
            if j < c0:
                B[i][j] = A[i][j]
            elif j > c0:
                B[i][j - 1] = A[i][j]
    return B

The most important kernel function LOGISTIC function between the input and output of each layer can take the function once for each element in the entire vector or matrix.

# 核函数(LOGISTIC)
def sigmoid_vec(x):
    return 1/(1+np.exp(-x))

Recognize a single number example

With the above foundation, we can define the following neural network classes:

# 神经网络类
class Neural_NetWork3:

(The following codes are written sequentially)
The initialization of this network includes the input training set, input layer, hidden layer, the number of layers of the output layer, the initialization of the transmission matrix, and the list of loss functions. The transfer matrix training network can be updated when given a certain training sample

	# 初始化
    def __init__(self, input_x, Input_lyout):

        self.x = input_x                                        # 输入所有数据
        self.ly_in = int(len(input_x[0][0]))                    # 输入层层数(784)
        self.ly_out = Input_lyout                               # 输出层层数
        self.ly_hid = int(sqrt(self.ly_in + self.ly_out) + 5)   # 隐藏层层数
        self.samplenum = int(len(input_x[0]))                   # 每个数字选取的样本数
        self.Loss = []                                          # 损失函数列表(在训练中会存入变化情况)

        # 输入层到隐藏层的传输矩阵,设置初值,在第一列设置偏置列
        self.W1 = np.random.normal(0.0, 1 / sqrt(self.ly_in), (self.ly_hid, self.ly_in + 1))
        # 隐藏层到输出层的传输矩阵,设置初值,在第一列设置偏置列
        self.W2 = np.random.normal(0.0, 1 / sqrt(self.ly_hid), (self.ly_out, self.ly_hid + 1))
        pass

In order to ensure the stability of the iteration, the given W 1 \mathbf{W} ^{1}W1W 2 \mathbf{W} ^{2}W2The initial value is set at( 0 , 1 m ) (0,\frac{1}{\sqrt{m} } )(0,m 1) and( 0 , 1 p ) (0,\frac{1}{\sqrt{p} } )(0,p 1) random number in the interval. Sometimes the initial value setting cannot be too large or all zero, which may cause divergence during training.
Herex \mathbf{x}x is a third-order tensor, and its two indexes in order are [thenumber of the number represented by the sample] [the number represented by the sample]
When training the judgment network of a single number, the output vector is a scalar,When the target digital sample that needs to be trained enters the neural network, the target output is 1 11 , when other samples pass through the network, the target output is0 00 . After a large number of sample training, the network can recognize a certain digital sample, and the output will be closer to 1. When other samples pass through the neural network, the output will be closer to 000. Therefore, the probability of a certain training number can be judged by outputting the value of the scalar.

  • Single Number Judgment
  • The output vector is greater than 0.5 0.50 . 5 (judgment prediction "yes" to that number)
  • The output vector is less than 0.5 0.50 . 5 (judging that the prediction is "not" the number)
    pass the samples through the neural network one after another.
    Referring to the pseudo code, the training function can be written as follows:
	# 神经网络的训练函数,训练某个数字的函数
    def Neural_Train_num(self, pre_num, alpha=0.2):

        for j in range(self.samplenum):
            for i in range(10):
                y = np.zeros((self.ly_out, 1))
                y[0][0] = (i == pre_num)

                W2_conv = dlt_c(self.W2, 0)         # W2删除第一列
                a1 = self.x[i][j]
                a1_en = enlarge_vec(a1)             # a1的增广向量,头部加1
                r2 = np.dot(self.W1, a1_en.T)       # 隐藏层输入向量
                a2 = sigmoid_vec(r2)                # 隐藏层输出向量
                a2_en = enlarge_vec(a2)             # a2的增广向量,头部加1
                r3 = np.dot(self.W2, a2_en.T)       # 输出层输入向量
                a3 = sigmoid_vec(r3)                # 输出层输出向量
                A2 = vec_diag(a2)                   # 由a2生成对角矩阵
                A3 = vec_diag(a3)                   # 由a3生成对角矩阵
                self.Loss.append(abs(np.linalg.norm(y - a3)))

                # 反向传递
                Delta2 = np.dot(A3 - np.dot(A3, A3), y - a3)
                Delta1 = np.dot(A2 - np.dot(A2, A2), W2_conv.T)
                Delta1 = np.dot(Delta1, Delta2)

                Del_W2 = np.dot(Delta2, a2_en)
                Del_W1 = np.dot(Delta1, a1_en)
                self.W2 = self.W2 + alpha * Del_W2
                self.W1 = self.W1 + alpha * Del_W1

When predicting the correctness of the neural network, there is the following logic:
when a given image input:

  • If the output is greater than 0.5 0.50 . 5 , and the number represented by the picture is exactly the number given before, which means the prediction is successful
  • If the output is less than 0.5 0.50 . 5 , and the number represented by the picture is not the number given before, it means that the prediction is successful
  • Otherwise, it is a prediction failure.
    Passing all samples through the network can give the correct rate of prediction, which is an important indicator to measure the excellence of the neural network.
    Write the following function:
	# 预测某个数字的正确率,输入为某个想要预测的图片上显示的数字
    def Neural_Rate_num(self, Predict_num):
        ra = 0      # 预测正确样本数
        rn = 0      # 计算样本总数
        for i in range(self.samplenum):
            for j in range(10):
                rn += 1

                a1 = self.x[j][i]
                a1_en = enlarge_vec(a1)  			# a1的增广向量,头部加1
                r2 = np.dot(self.W1, a1_en.T)  		# 隐藏层输入向量
                a2 = sigmoid_vec(r2)  				# 隐藏层输出向量
                a2_en = enlarge_vec(a2)  			# a2的增广向量,头部加1
                r3 = np.dot(self.W2, a2_en.T)  		# 输出层输入向量
                a3 = sigmoid_vec(r3)  				# 输出层输出向量

                if j == Predict_num and np.linalg.norm(a3) > 0.5:
                    ra += 1
                if j != Predict_num and np.linalg.norm(a3) < 0.5:
                    ra += 1
        return ra / rn

The following function is used to judge the prediction result of a single picture

	# 测试单独某张图片预测结果,输出为是或否
    def Neural_Predict_num(self, x_Pre):

        a1 = x_Pre
        a1_en = enlarge_vec(a1)
        r2 = np.dot(self.W1, a1_en.T)
        a2 = sigmoid_vec(r2)
        a2_en = enlarge_vec(a2)
        r3 = np.dot(self.W2, a2_en.T)
        a3 = sigmoid_vec(r3)
        print(a3)

        if a3[0] > 0.5:
            return True
        else:
            return False

Recognize all digits example

When recognizing all numbers, you only need to change the order of the output vector to 10 101 0 is enough, and each row represents the expected output column of a certain number.When the target digital samples that need to be trained enter the neural network, the behavior corresponding to the target output 1 11 , for example, when a picture representing 0 is input, the first row of the target output is 1, and the rest are 0; when a picture representing 1 is input, the second row of the target output is 1, and the rest are 0, and so on.
Write the training function that predicts all numbers as follows:

	# 神经网络的训练函数,训练全部数字的函数
    def Neural_Train_0_9(self, alpha=0.2):

        for j in range(self.samplenum):
            for i in range(10):
                y = np.zeros((self.ly_out, 1))
                y[i][0] = 1
                W2_conv = dlt_c(self.W2, 0)         # W2删除第一列并取转置
                a1 = self.x[i][j]
                a1_en = enlarge_vec(a1)             # a1的增广向量,头部加1
                r2 = np.dot(self.W1, a1_en.T)       # 隐藏层输入向量
                a2 = sigmoid_vec(r2)                # 隐藏层输出向量
                a2_en = enlarge_vec(a2)             # a2的增广向量,头部加1
                r3 = np.dot(self.W2, a2_en.T)       # 输出层输入向量
                a3 = sigmoid_vec(r3)                # 输出层输出向量
                A2 = vec_diag(a2)                   # 由a2生成对角矩阵
                A3 = vec_diag(a3)                   # 由a3生成对角矩阵

                self.Loss.append(abs(np.linalg.norm(y - a3)))

                # 反向传递
                Delta2 = np.dot(A3 - np.dot(A3, A3), y - a3)
                Delta1 = np.dot(A2 - np.dot(A2, A2), W2_conv.T)
                Delta1 = np.dot(Delta1, Delta2)

                Del_W2 = np.dot(Delta2, a2_en)
                Del_W1 = np.dot(Delta1, a1_en)
                self.W2 = self.W2 + alpha * Del_W2
                self.W1 = self.W1 + alpha * Del_W1

When predicting a certain picture, the number of rows where the largest value in the output vector is used to represent its prediction result. According to the previous setting, the number 1~9 is represented exactly from the 0th to the 9th line of the python counting vector. Referring to the previous function, the prediction function for predicting all numbers can be written as follows:

	# 预测所有数字的正确率
    def Neural_Rate_0_9(self):

        ra = 0      # 预测正确样本数
        rn = 0      # 计算样本总数
        for i in range(self.samplenum):
            for j in range(10):
                rn += 1

                a1 = self.x[j][i]
                a1_en = enlarge_vec(a1)             # a1的增广向量,头部加1
                r2 = np.dot(self.W1, a1_en.T)       # 隐藏层输入向量
                a2 = sigmoid_vec(r2)                # 隐藏层输出向量
                a2_en = enlarge_vec(a2)             # a2的增广向量,头部加1
                r3 = np.dot(self.W2, a2_en.T)       # 输出层输入向量
                a3 = sigmoid_vec(r3)                # 输出层输出向量

                if np.argmax(a3) == j:
                    ra += 1
        return ra / rn

	# 测试单独某张图片预测结果,输出为0~9中的某一个数
    def Neural_Predict_0_9(self, x_Pre):

        a1 = x_Pre
        a1_en = enlarge_vec(a1)
        r2 = np.dot(self.W1, a1_en.T)
        a2 = sigmoid_vec(r2)
        a2_en = enlarge_vec(a2)
        r3 = np.dot(self.W2, a2_en.T)
        a3 = sigmoid_vec(r3)

        return np.argmax(a3)

Build the main program framework

Create a file main.py and write it into the main program framework

  • Import library functions
  • Adjust the Chinese display of matplotlib functions
  • To import data
    data, you only need to decompress the compressed package downloaded above directly to the project path:
    (In the "handwritten" folder, readers can also modify the path by themselves and improve it from the following code:)
from tqdm import tqdm
import matplotlib.pyplot as plt
import random
from NeuralClass import *

# matplotlib画图中中文显示会有问题,需要这两行设置默认字体可以显示中文
plt.rcParams['font.sans-serif'] = ['SimHei']
plt.rcParams['axes.unicode_minus'] = False

samplenum = int(input('请输入每个数字图片所取的样本数(0~500):'))          # 每个数字图片所取的样本数


# x[样本所表示数字的第几个样本][样本表示的数字]
x = [list() for i in range(10)]
print('数据读取中...')
for i in tqdm(range(samplenum)):
    for j in range(10):
        x[j].append(vector_c(get_gray_mt('手写\\'+str(j)+'\\'+str(j)+'_' + str(i+1) + '.bmp')))
print('读取完成!')

Importing data The import library function tqdm adds a progress bar for the data import process, and it takes 1 to 2 minutes to import all 5000 image data.
The generated third-order tensor x \mathbf{x}x is the training set, and its two indexes in order are [the sample number of the number represented by the sample] [the number represented by the sample].
It should be noted that when selecting the network to be learned to train the network, it is possible to train all samples from the network only once and cannot achieve a high accuracy rate. Therefore, it is necessary to continue training the network, adjust the learning parameters, and pass all the samples through the network again...

Write the neural network into the main program to build the following framework:

训练全部数字的神经网络
训练某个数字的神经网络
开始
读取数据
选择训练网络类型
输入学习率
训练网络
是否继续训练
显示损失函数变化图像
预测图片-显示图片--循环
是否重新选择网络训练
结束

Write out the code as follows:

while 1:

    bool_form = int(input('选择训练全部数字的网络输入”1“,选择训练某个数字的网络输入”0”:'))
    if bool_form == 1:
        # 构建初始网络
        Net1 = Neural_NetWork3(x, 10)

        while 1:
            alpha = float(input('请输入学习率(建议输入0~1):'))
            print('开始训练神经网络...')
            Net1.Neural_Train_0_9(alpha)
            rate1 = Net1.Neural_Rate_0_9()
            print('判断0~9数字正确率为:' + str(round(rate1, 3) * 100) + '%')

            bool1 = float(input('是否继续训练?继续请输入1,退出请输入其他数字:'))
            if bool1 != 1:
                break

        plt.figure('损失函数变化图象')
        plt.plot(Net1.Loss)
        plt.xlabel('迭代次数')
        plt.ylabel('损失函数$L =||\\mathbf{y}-\\mathbf{a}_{output}||^{2}$')
        plt.show()

        while 1:
            pre_num1 = int(input('请输入需要预测的图片(用图片上的数字表示):'))

            # 从选择的数字中随机抽取一张图片
            pre_spnum = random.randint(0, samplenum - 1)
            pre_result = Net1.Neural_Predict_0_9(x[pre_num1][pre_spnum])
            print('预测结果为:' + str(pre_result))
            if pre_result == pre_num1:
                print('预测成功!')
            else:
                print('预测失败!')
            img = Image.open('手写\\' + str(pre_num1) + '\\' + str(pre_num1) + '_' + str(pre_spnum + 1) + '.bmp')
            img.show()

            bool2 = float(input('是否继续预测?继续请输入1,退出请输入其他数字:'))

            if bool2 != 1:
                break

    elif bool_form == 0:
        # 构建初始网络
        Net2 = Neural_NetWork3(x, 1)
        pre_num = int(input('请输入需要预测的数字(0~9):'))

        while 1:
            alpha = float(input('请输入学习率(建议输入0~1):'))
            print('开始训练神经网络...')
            Net2.Neural_Train_num(pre_num, alpha)
            rate1 = Net2.Neural_Rate_num(pre_num)
            print('判断数字' + str(pre_num) + '数字正确率为:' + str(round(rate1, 3) * 100) + '%')

            bool1 = float(input('是否继续训练?继续请输入1,退出请输入其他数字:'))
            if bool1 != 1:
                break
        plt.figure('损失函数变化图象')
        plt.plot(Net2.Loss)
        plt.xlabel('迭代次数')
        plt.ylabel('损失函数$L =||\\mathbf{y}-\\mathbf{a}_{output}||^{2}$')
        plt.show()
        while 1:
            pre_num1 = int(input('请输入需要预测的图片(用图片上的数字表示):'))

            # 从选择的数字中随机抽取一张图片
            pre_spnum = random.randint(0, samplenum - 1)
            pre_result = Net2.Neural_Predict_num(x[pre_num1][pre_spnum])
            print(pre_result)
            if pre_result == True:
                print('预测结果为:是' + str(pre_num))
                if pre_num1 == pre_num:
                    print('预测成功!')
                else:
                    print('预测失败!')
            else:
                print('预测结果为:非' + str(pre_num))
                if pre_num1 != pre_num:
                    print('预测成功!')
                else:
                    print('预测失败!')

            img = Image.open('手写\\' + str(pre_num1) + '\\' + str(pre_num1) + '_' + str(pre_spnum + 1) + '.bmp')
            img.show()

            bool2 = float(input('是否继续预测?继续请输入1,退出请输入其他数字:'))

            if bool2 != 1:
                break

    bool_all = int(input('是否重新选择网络训练?“是”请输入“1”,退出请输入“0”:'))
    if bool_all == 0:
        break

The LaTeX code is used when drawing and displaying, the URL is as follows:
https://www.latexlive.com/
An example of the process of executing the above main.py file is as follows:

C:\ProgramData\Anaconda3\python.exe C:/TJUcmj/学科/Python/NeuralNetwork3/Neural_main.py
请输入每个数字图片所取的样本数(0~500):500
数据读取中…
100%|██████████| 500/500 [01:33<00:00, 5.37it/s]
读取完成!
选择训练全部数字的网络输入”1“,选择训练某个数字的网络输入”0”:0
请输入需要预测的数字(0~9):5
请输入学习率(建议输入0~1):0.2
开始训练神经网络…
判断数字5数字正确率为:96.1%
是否继续训练?继续请输入1,退出请输入其他数字:1
请输入学习率(建议输入0~1):0.2
开始训练神经网络…
判断数字5数字正确率为:96.89999999999999%
是否继续训练?继续请输入1,退出请输入其他数字:1
请输入学习率(建议输入0~1):0.5
开始训练神经网络…
判断数字5数字正确率为:97.2%
是否继续训练?继续请输入1,退出请输入其他数字:0

insert image description here
请输入需要预测的图片(用图片上的数字表示):8
[[0.00016551]]
False
预测结果为:非5
预测成功!

insert image description here
是否继续预测?继续请输入1,退出请输入其他数字:1
请输入需要预测的图片(用图片上的数字表示):5
[[0.0074179]]
False
预测结果为:非5
预测失败!

insert image description here
是否继续预测?继续请输入1,退出请输入其他数字:1
请输入需要预测的图片(用图片上的数字表示):5
[[0.96203601]]
True
预测结果为:是5
预测成功!

insert image description here
是否继续预测?继续请输入1,退出请输入其他数字:0
是否重新选择网络训练?“是”请输入“1”,退出请输入“0”:1
选择训练全部数字的网络输入”1“,选择训练某个数字的网络输入”0”:1
请输入学习率(建议输入0~1):0.5
开始训练神经网络…
判断0~9数字正确率为:91.5%
是否继续训练?继续请输入1,退出请输入其他数字:1
请输入学习率(建议输入0~1):0.2
开始训练神经网络…
判断0~9数字正确率为:94.5%
是否继续训练?继续请输入1,退出请输入其他数字:1
请输入学习率(建议输入0~1):0.2
开始训练神经网络…
判断0~9数字正确率为:95.1%
是否继续训练?继续请输入1,退出请输入其他数字:1
请输入学习率(建议输入0~1):0.3
开始训练神经网络…
判断0~9数字正确率为:95.6%
是否继续训练?继续请输入1,退出请输入其他数字:1
请输入学习率(建议输入0~1):0.8
开始训练神经网络…
判断0~9数字正确率为:94.39999999999999%
是否继续训练?继续请输入1,退出请输入其他数字:1
请输入学习率(建议输入0~1):0.1
开始训练神经网络…
判断0~9数字正确率为:96.5%
是否继续训练?继续请输入1,退出请输入其他数字:0

insert image description here
请输入需要预测的图片(用图片上的数字表示):0
预测结果为:0
预测成功!

insert image description here
是否继续预测?继续请输入1,退出请输入其他数字:1
请输入需要预测的图片(用图片上的数字表示):7
预测结果为:7
预测成功!

insert image description here
是否继续预测?继续请输入1,退出请输入其他数字:1
请输入需要预测的图片(用图片上的数字表示):4
预测结果为:4
预测成功!

insert image description here
是否继续预测?继续请输入1,退出请输入其他数字:0
是否重新选择网络训练?“是”请输入“1”,退出请输入“0”:0

进程已结束,退出代码 0

The above is the entire content of this project. The open source code developed by myself is for everyone to learn and use. Please feel free to correct me if you have any questions.
Problems in this project: When training a neural network of a certain number, the number of positive samples and the number of negative samples should be equal. In this project, for the continuity of the two training networks, the number of negative samples is 9 times the number of positive samples. Therefore, although the overall accuracy rate is high after training the entire neural network, the accuracy rate may be relatively low when judging this number.
The idea provided is as follows:
the sample cycle training is proposed outside the neural network class and carried out in the main program, and the number of negative samples can be adjusted when training a certain number of neural networks.
Readers can modify it by themselves.

Guess you like

Origin blog.csdn.net/m0_53253879/article/details/124056111
Recommended