Neural network principle and derivation of backpropagation formula, additional python code implementation (not using deep learning framework, using numpy to implement)

1. Simple principles of neural network

Logistic regression can only solve linearly separable problems. For XOR problems, it is impossible to find a straight line to separate two classes, so neural networks are introduced, adding more neurons and activation functions to fit nonlinear problems. .
The introduction to logistic regression is in my other blog: Introduction and code implementation of logistic regression

Neural network structure

The neural network consists of an input layer, features of the input data, at least one hidden layer, and an output layer.
In this example, the input data x has three features, the hidden layer contains three neurons, and the output layer outputs one neuron. This example can be used to solve regression problems and binary classification problems. If the regression problem is solved, there is no need to go through the activation function. If the binary classification problem is solved, the output layer maps the output value to between 0-1 through the sigmoid activation function, and compares the size with 0.5. Determine the output category.
Please add image description

Forward propagation calculation

The extra neuron at the bottom of the figure is used to represent the bias value. The value of this neuron is always 1. Multiplied by the corresponding weight, it can represent the bias value
For the hidden layer
a i ( 2 ) = g ( [ ∑ j = 1 3 ( x j θ i j + b i ( 2 ) ) ] ) a_{i}^{(2)} =g([\sum_{j=1}^{3}(x_{j} \theta_{ij}+b_{i}^{(2)} )]) ai(2)=g([j=13(xjiij+bi(2))])
θ i j ( 2 ) \theta_{ij}^{(2)} iij(2) represents the connection weight between the i-th neuron in layer 2 and the j-th neuron in layer 1
g represents the activation function

对于输出层
h = g ( [ ∑ j = 1 3 ( a j ( 2 ) θ i j ( 3 ) + b i ( 3 ) ) ] ) h=g([ \sum_{j=1}^{3}(a_{j}^{(2)} \theta_{ij}^{(3)}+b_{i}^{(3)} )]) h=g([j=13(aj(2)iij(3)+bi(3))])

θ i j ( 3 ) \theta_{ij}^{(3)} iij(3)Represents the connection weight between the i-th neuron in layer 3 and the j-th neuron in layer 1

Expand all the formulas and write:
Please add image description

This formula is similar to the operation rules of matrix multiplication, and can convert the forward operation process into matrix operation
Please add image description
θ ( 1 ) \theta ^{(1)} a>i(1)Representative first layer total number square θ 10 ( 1 ) , θ 11 ( 1 ) , θ 12 ( 1 ) , θ 13 ( 1 ) \theta_{10}^{(1)}, \theta_{11}^{(1)}, \theta_{12}^{(1)}, \theta_{13}^{(1)} i10(1)θ11(1)θ12(1)θ13(1)…,Example inside θ ( 1 ) \theta ^{(1)} i(1)This is one 3*4 square

cost function

When used with C (C>2) classification tasks, the output layer is C neurons, which can be regarded as C two-classification problems. For example, a three-classification task is to determine whether an animal is a cat, a dog, or a pig. The output layer is Three neurons, the first neuron determines whether it is a cat or not a cat, the second neuron determines whether it is a dog or not a dog, the third neuron determines whether it is a pig or not a pig, which neuron output value is the largest? one type.

The label value of y is not 1 2 3, but the value after one-hot encoding. There are three categories in total. The first category is expressed as [1,0,0], and the second category is expressed as [0,1 ,0], the third type is expressed as [0,0,1]
Cost function:
J ( θ ) = − 1 m [ ∑ i = 1 m ∑ k = 1 K y k ( i ) l n ( h ( x ( i ) ) ) k + ( 1 − y k ( i ) ) l n ( 1 − h ( x ( i ) ) ) k ] J( \theta )= -\frac{1}{m} [\sum_{i=1}^{m} \sum_{k=1}^{K} y_{k}^{(i)} ln(h(x^{( i)}))_{k}^{}+(1-y_{k}^{(i)})ln(1-h(x^{(i)}))_{k}^{}] J(θ)=m1[i=1mk=1Kandk(i)ln(h(x(i)))k+(1andk(i))ln(1h(x(i)))k]
y k ( i ) y_{k}^{(i)} andk(i)Represents the k-th component of the i-th sample label value, h ( x ( i ) ) k h(x^{(i)})_{k} h(x(i))kRepresents the k-th component of the output value

2. Derivation of backpropagation formula

Define variables

l represents the layer index, L represents the index of the last layer, and jk represents the index of neurons in a certain layer
N represents the number of neurons in each layer. θ \theta θRepresentative connection, b representative eccentricity
θ i j ( l ) \theta_{ij}^{(l)} iij(l)Represents the connection weight between the j-th neuron in the l-th layer and the k-th neuron in the l-1 layer
b j ( l ) b_{j}^{(l) } bj(l)Represents the bias value of the j-th neuron in the l-th layer

z j ( l ) = ∑ k = 1 N ( l − 1 ) [ θ j k ( l ) a k ( l − 1 ) + b j ( l ) ] z_{j}^{(l)}=\sum_{k=1 }^{N^{(l-1)}}[ \theta_{jk}^{(l)}a_{k}^{(l-1)}+b_{j}^{(l)}]Withj(l)=k=1N(l1)[θjk(l)ak(l1)+bj(l)] represents the input of the j-th neuron in the l-th layer

a j ( l ) = g ( z j ( l ) ) a_{j}^{(l)}=g(z_{j}^{(l)}) aj(l)=g(zj(l)) represents the output of the j-th neuron in the l-th layer g represents the activation function

The remaining residual of the j-th neuron in the l-th layer is ξ j ( l ) = ∂ J ∂ z j ( l ) \xi_{j}^{( l)}=\frac{\partial J}{\partial z_{j}^{(l)}} Xj(l)=zj(l)J

Derivation of backpropagation formula

For the output layer:
The derivation process is based on the fact that the activation layer is a sigmoid activation function. Generally, the activation function of the multi-classification output layer is softmax, because I just derived it in logistic regression. This formula has been passed, so we still use sigmoid as the activation function and softmax as the activation function. Readers are asked to deduce the process by themselves. If I remember correctly, it is also h-y
Please add image description
For the hidden layer:
Please add image description

For connection weights and bias values
Please add image description

Then according to the partial derivative value of each parameter based on the cost J, the optimal solution can be found using the gradient descent method.
For the principle of gradient descent, you can refer to this article: The principle of gradient descent

3. python code implementation, based on numpy

Use the iris data set that comes with sklearn to build a simple three-layer neural network, with ten neurons in the second layer and three neurons in the third layer.

from sklearn import datasets
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import  StandardScaler

#relu激活函数
def relu(x):
    x=np.where(x>0,x,0)#大于0的还是本身,小于0的设置为0
    return x

#relu激活函数的导数,大于0的导数是1,小于0的导数是0
def back_relu(z):
    return np.where(z>0,1,0)

#最后一层softmax激活函数,输出一个概率分布
def softmax(x):
    a=np.sum(np.exp(x), axis=1)
    b=np.expand_dims(a, 1).repeat(3, axis=1)
    return np.exp(x)/b

#把离散的类别转化为one-hot编码
def trans_label(labels):
    newlabels=np.zeros((len(labels),3))
    for i,label in enumerate(labels):
        newlabels[i,label]=1
    return newlabels
#评估模型准确率
def evaluate(x_test,y_test):
    num_correct = 0
    for x, y in zip(x_test, y_test):
        y_hat = model.predict(x)

        y = np.argmax(y)
        if y_hat == y:
            num_correct += 1
    acc=num_correct / len(x_test)
    return acc
class model():
    def __init__(self,num_iters=200,lr=0.15):
        """

        :param num_iters: 迭代次数
        :param lr: 学习率
        """
        #定义模型参数
        #这是一个三层的神经网络,第一层是输入层,第二层有十个神经元,第三层有三个神经元,因为是3分类
        self.Theta2 = np.random.rand(4, 10)
        self.Theta3 = np.random.rand(10, 3)
        self.B2 = np.zeros(10)
        self.B3 = np.zeros(3)
        self.num_iters=num_iters
        self.lr=lr
    #模型的训练
    #由于该例子模型较为简单,为便于理解前向传播与反向传播梯度下降都是一步一步来的,如果模型较为复杂,可以使用循环
    def fit(self,X,Y):
        """

        :param X: 特征矩阵
        :param Y: 类别
        :return:
        """
        #数据个数
        m=len(X)
        for k in range(self.num_iters):
            loss = 0
            #损失对各个参数的偏导数
            dj_dTheta3 = 0
            dj_dB3 = 0
            dj_dTheta2 = 0
            dj_dB2 = 0
            for i in range(len(X)):
                #由于用到矩阵相乘,将其扩充一个维度
                x = np.expand_dims(X[i], 0)
                y = Y[i]
                z2 = np.dot(x, self.Theta2) + self.B2
                a2 = relu(z2)

                z3 = np.dot(a2, self.Theta3) + self.B3
                a3 = softmax(z3)
                h=a3

                loss += -(np.sum(np.multiply(y, np.log(h + 1e-5))) +
                          np.sum(np.multiply((1 - y), np.log(1 - h + 1e-5))))
                #防止ln函数溢出,加一个1e-5


                XI3 = a3 - y

                XI2 = np.multiply(np.dot(XI3, self.Theta3.T), back_relu(z2))

                dj_dTheta3 += np.dot(a2.T, XI3)
                dj_dB3 += XI3

                dj_dTheta2 += np.dot(x.T, XI2)
                dj_dB2 += XI2
            dj_dTheta3 /= m
            dj_dB3 /= m
            dj_dTheta2 /= m
            dj_dB2 /= m
            #进行梯度下降
            self.Theta3 = self.Theta3 - self.lr * dj_dTheta3
            self.B3 = self.B3 - self.lr * dj_dB3
            self.Theta2 = self.Theta2 - self.lr * dj_dTheta2
            self.B2 = self.B2 - self.lr * dj_dB2
            loss /= m
            print("num_iter:%d,loss:%f"%(k,loss))
    #前向传播
    def forward(self,x):
        x = np.expand_dims(x, 0)
        z2 = np.dot(x, self.Theta2) + self.B2

        a2 = relu(z2)

        z3 = np.dot(a2, self.Theta3) + self.B3
        a3 = softmax(z3)
        h=a3
        return h
    #预测样本类别
    def predict(self,x):
        h=self.forward(x)
        #输出的三个概率分布,最大的那个判定为那一类
        y_hat=np.argmax(h,1)[0]
        return y_hat
if __name__ == '__main__':
    iris = datasets.load_iris()
    data = iris.data
    label = iris.target
    label = trans_label(label)
    
    #数据标准化
    std = StandardScaler()
    data = std.fit_transform(data)

    x_train, x_test, y_train, y_test = train_test_split(data, label, test_size=0.2, random_state=0)

    model=model()
    model.fit(x_train,y_train)
    print("训练完成")
    acc=evaluate(x_test,y_test)
    print("测试集正确率为:%.3f%%"%(acc*100))


运行结果:
D:\Anaconda3\python.exe D:/pycharmproject/机器学习算法复习/神经网络2.py
D:\Anaconda3\lib\site-packages\sklearn\feature_extraction\text.py:17: DeprecationWarning: Using or importing the ABCs from ‘collections’ instead of from ‘collections.abc’ is deprecated, and in 3.8 it will stop working
from collections import Mapping, defaultdict
num_iter:0,loss:3.063194
num_iter:1,loss:2.239954
num_iter:2,loss:1.693593
num_iter:3,loss:1.551304
num_iter:4,loss:1.506748
num_iter:5,loss:1.479147
num_iter:6,loss:1.457542
num_iter:7,loss:1.438619
num_iter:8,loss:1.421467
num_iter:9,loss:1.405498
num_iter:10,loss:1.390019
num_iter:11,loss:1.375111
num_iter:12,loss:1.360902
num_iter:13,loss:1.347246
num_iter:14,loss:1.333915
num_iter:15,loss:1.321025
num_iter:16,loss:1.308455
num_iter:17,loss:1.296189
num_iter:18,loss:1.284267
num_iter:19,loss:1.272691
num_iter:20,loss:1.261426
num_iter:21,loss:1.250217
num_iter:22,loss:1.239248
num_iter:23,loss:1.228504
num_iter:24,loss:1.217936
num_iter:25,loss:1.207321
num_iter:26,loss:1.196690
num_iter:27,loss:1.186266
num_iter:28,loss:1.176033
num_iter:29,loss:1.165851
num_iter:30,loss:1.155603
num_iter:31,loss:1.145506
num_iter:32,loss:1.135513
num_iter:33,loss:1.125471
num_iter:34,loss:1.115541
num_iter:35,loss:1.105662
num_iter:36,loss:1.095764
num_iter:37,loss:1.085812
num_iter:38,loss:1.075536
num_iter:39,loss:1.064804
num_iter:40,loss:1.053329
num_iter:41,loss:1.042024
num_iter:42,loss:1.030859
num_iter:43,loss:1.019773
num_iter:44,loss:1.008787
num_iter:45,loss:0.997435
num_iter:46,loss:0.985707
num_iter:47,loss:0.973916
num_iter:48,loss:0.962310
num_iter:49,loss:0.950888
num_iter:50,loss:0.939652
num_iter:51,loss:0.928598
num_iter:52,loss:0.917294
num_iter:53,loss:0.905565
num_iter:54,loss:0.893416
num_iter:55,loss:0.881281
num_iter:56,loss:0.869301
num_iter:57,loss:0.856783
num_iter:58,loss:0.844534
num_iter:59,loss:0.832546
num_iter:60,loss:0.820826
num_iter:61,loss:0.809372
num_iter:62,loss:0.798184
num_iter:63,loss:0.787256
num_iter:64,loss:0.776585
num_iter:65,loss:0.766098
num_iter:66,loss:0.755540
num_iter:67,loss:0.745238
num_iter:68,loss:0.735184
num_iter:69,loss:0.725355
num_iter:70,loss:0.715305
num_iter:71,loss:0.705144
num_iter:72,loss:0.694819
num_iter:73,loss:0.684792
num_iter:74,loss:0.674848
num_iter:75,loss:0.663907
num_iter:76,loss:0.653278
num_iter:77,loss:0.642616
num_iter:78,loss:0.632334
num_iter:79,loss:0.621853
num_iter:80,loss:0.611735
num_iter:81,loss:0.601942
num_iter:82,loss:0.592096
num_iter:83,loss:0.582128
num_iter:84,loss:0.571898
num_iter:85,loss:0.562073
num_iter:86,loss:0.552639
num_iter:87,loss:0.543344
num_iter:88,loss:0.533303
num_iter:89,loss:0.523705
num_iter:90,loss:0.514535
num_iter:91,loss:0.505754
num_iter:92,loss:0.497341
num_iter:93,loss:0.489270
num_iter:94,loss:0.481526
num_iter:95,loss:0.474088
num_iter:96,loss:0.466942
num_iter:97,loss:0.460070
num_iter:98,loss:0.453469
num_iter:99,loss:0.447136
num_iter:100,loss:0.441093
num_iter:101,loss:0.435259
num_iter:102,loss:0.429630
num_iter:103,loss:0.424196
num_iter:104,loss:0.418948
num_iter:105,loss:0.413882
num_iter:106,loss:0.408978
num_iter:107,loss:0.404225
num_iter:108,loss:0.399610
num_iter:109,loss:0.395138
num_iter:110,loss:0.390805
num_iter:111,loss:0.386602
num_iter:112,loss:0.382522
num_iter:113,loss:0.378551
num_iter:114,loss:0.374696
num_iter:115,loss:0.370963
num_iter:116,loss:0.367340
num_iter:117,loss:0.363815
num_iter:118,loss:0.360384
num_iter:119,loss:0.357042
num_iter:120,loss:0.353787
num_iter:121,loss:0.350614
num_iter:122,loss:0.347520
num_iter:123,loss:0.344501
num_iter:124,loss:0.341556
num_iter:125,loss:0.338681
num_iter:126,loss:0.335875
num_iter:127,loss:0.333134
num_iter:128,loss:0.330454
num_iter:129,loss:0.327833
num_iter:130,loss:0.325271
num_iter:131,loss:0.322764
num_iter:132,loss:0.320311
num_iter:133,loss:0.317910
num_iter:134,loss:0.315559
num_iter:135,loss:0.313258
num_iter:136,loss:0.311011
num_iter:137,loss:0.308808
num_iter:138,loss:0.306650
num_iter:139,loss:0.304540
num_iter:140,loss:0.302480
num_iter:141,loss:0.300460
num_iter:142,loss:0.298477
num_iter:143,loss:0.296532
num_iter:144,loss:0.294622
num_iter:145,loss:0.292750
num_iter:146,loss:0.290911
num_iter:147,loss:0.289105
num_iter:148,loss:0.287329
num_iter:149,loss:0.285582
num_iter:150,loss:0.283865
num_iter:151,loss:0.282178
num_iter:152,loss:0.280516
num_iter:153,loss:0.278883
num_iter:154,loss:0.277277
num_iter:155,loss:0.275697
num_iter:156,loss:0.274143
num_iter:157,loss:0.272618
num_iter:158,loss:0.271122
num_iter:159,loss:0.269649
num_iter:160,loss:0.268201
num_iter:161,loss:0.266775
num_iter:162,loss:0.265371
num_iter:163,loss:0.263989
num_iter:164,loss:0.262628
num_iter:165,loss:0.261287
num_iter:166,loss:0.259967
num_iter:167,loss:0.258667
num_iter:168,loss:0.257390
num_iter:169,loss:0.256131
num_iter:170,loss:0.254891
num_iter:171,loss:0.253670
num_iter:172,loss:0.252466
num_iter:173,loss:0.251280
num_iter:174,loss:0.250111
num_iter:175,loss:0.248958
num_iter:176,loss:0.247820
num_iter:177,loss:0.246698
num_iter:178,loss:0.245592
num_iter:179,loss:0.244501
num_iter:180,loss:0.243424
num_iter:181,loss:0.242362
num_iter:182,loss:0.241314
num_iter:183,loss:0.240280
num_iter:184,loss:0.239259
num_iter:185,loss:0.238252
num_iter:186,loss:0.237258
num_iter:187,loss:0.236277
num_iter:188,loss:0.235308
num_iter:189,loss:0.234352
num_iter:190,loss:0.233439
num_iter:191,loss:0.232550
num_iter:192,loss:0.231672
num_iter:193,loss:0.230794
num_iter:194,loss:0.229951
num_iter:195,loss:0.229083
num_iter:196,loss:0.228270
num_iter:197,loss:0.227416
num_iter:198,loss:0.226632
num_iter:199,loss:0.225797
训练完成
测试集正确率为:100.000%

Process finished with exit code 0

Hope it helps everyone~~

Guess you like

Origin blog.csdn.net/weixin_44599230/article/details/121500735