Neural network model

The amount of data is getting larger, the data dimension is getting higher and higher (text, images), multi-source data
It is difficult for other models to handle this multi-source, high-dimensional data. Most of the current artificial intelligence products use neural network models

Principles of neural networks

Input layer: independent variable
Hidden layer: intermediate state (feature extraction)
Output layer: dependent variable

Feature extraction of hidden layers in convolutional neural networks

How neural networks produce output

Linear calculation method

Problems with linear models

Introduce nonlinearity

Activation function sigmoid

The output of the sigmoid function is between 0 and 1
It can be understood that it puts (−∞, + ∞)
The number in the range is compressed to (0, 1) .

How the neural network achieves self-correction

Error function

Mean square error:

MSE Derivation:

Loss function

It is actually a multivariate function with multiple weights and offsets

Neural network self-adjustment:

How the error MSE becomes smaller: adjust the parameters,

How to adjust the parameters?

How to find the minimum value of this function

w is on the right: w should decrease, the basis for judgment: the partial derivative is positive

w on the left: w should increase, the basis for judgment: the partial derivative is negative

Method: partial derivative ∂ L / ∂ w

Find partial derivatives of parameters to minimize MSE

More complex the function is: we can not determine the global minimum can only find local minimum

The whole process of parameter optimization

How to find the gradient of ownership value

According to the chain derivation rule

Then find ∂h1 / ∂w1

Derivative of sigmoid function (activation function):

Chain derivation process

Parameter update strategy

Stochastic gradient descent (SGD)

η is a constant, called the learning rate , which determines the speed of the training network.

The w1 minus [eta] · ∂L / ∂w1 , just wait until the new weight w1 .

When ∂ L / ∂w1 is positive, w1 will become smaller; when ∂ L / ∂w1 is negative, w1 will become larger.

Model training process

Select a sample from the data set;
Calculate the partial derivative of the loss function for weight and bias;
Use the update formula to update each weight and offset;
Back to step 1

All formulas

Forward conduction

Backward conduction

Parameter update

Neural network program implementation

Select a sample from the data set;
Calculate the partial derivative of the loss function for weight and bias;
Use the update formula to update each weight and offset;
Back to step 1

code show as below:

import numpy as np


def sigmoid(x):
    # Sigmoid 激活函数: f(x) = 1 / (1 + e^(-x))
    return 1 / (1 + np.exp(-x))


def deriv_sigmoid(x):
    # sigmoid的倒数: f'(x) = f(x) * (1 - f(x))
    fx = sigmoid(x)
    return fx * (1 - fx)


def mse_loss(y_true, y_pred):
    # MSE损失函数
    return ((y_true - y_pred) ** 2).mean()


class OurNeuralNetwork:

    def __init__(self):
        # 权值矩阵
        self.w1 = np.random.normal()
        self.w2 = np.random.normal()
        self.w3 = np.random.normal()
        self.w4 = np.random.normal()
        self.w5 = np.random.normal()
        self.w6 = np.random.normal()

        # 偏置
        self.b1 = np.random.normal()
        self.b2 = np.random.normal()
        self.b3 = np.random.normal()

    def feedforward(self, x):
        # sum_h1 = self.w1 * x[0] + self.w2 * x[1] + self.b1
        # h1 = sigmoid(sum_h1)
        #
        # sum_h2 = self.w3 * x[0] + self.w4 * x[1] + self.b2
        # h2 = sigmoid(sum_h2)
        #
        # sum_o1 = self.w5 * h1 + self.w6 * h2 + self.b3
        # o1 = sigmoid(sum_o1)
        # y_pred = o1
        # return  y_pred

        h1 = sigmoid(self.w1 * x[0] + self.w2 * x[1] + self.b1)
        h2 = sigmoid(self.w3 * x[0] + self.w4 * x[1] + self.b2)
        o1 = sigmoid(self.w5 * h1 + self.w6 * h2 + self.b3)
        return o1

    # x是输入数据，这里输入只有两个属性.

    def train(self, data, all_y_trues):

        learn_rate = 0.1
        epochs = 1000  # 迭代次数

        for epoch in range(epochs):
            for x, y_true in zip(data, all_y_trues):
                # feed forward
                sum_h1 = self.w1 * x[0] + self.w2 * x[1] + self.b1
                h1 = sigmoid(sum_h1)

                sum_h2 = self.w3 * x[0] + self.w4 * x[1] + self.b2
                h2 = sigmoid(sum_h2)

                sum_o1 = self.w5 * h1 + self.w6 * h2 + self.b3
                o1 = sigmoid(sum_o1)
                y_pred = o1

                # 计算偏导数.
                # 对目标函数求导
                d_L_d_ypred = -2*(y_true - y_pred)
                # # Neuron o1
                # d_ypred_d_w5 = h1 * deriv_sigmoid(self.w5 * h1 + self.w6 * h2 + self.b3)
                # d_ypred_d_w6 = h2 * deriv_sigmoid(self.w5 * h1 + self.w6 * h2 + self.b3)
                # d_ypred_d_b3 = deriv_sigmoid(self.w5 * h1 + self.w6 * h2 + self.b3)
                #
                # d_ypred_d_h1 = self.w5 * deriv_sigmoid(self.w5 * h1 + self.w6 * h2 + self.b3)
                # d_ypred_d_h2 = self.w6 * deriv_sigmoid(self.w5 * h1 + self.w6 * h2 + self.b3)
                #
                # # Neuron h1
                # d_h1_d_w1 = x[0]*deriv_sigmoid(self.w1*x[0] + self.w2*x[1] + self.b1)
                # d_h1_d_w2 = x[1]*deriv_sigmoid(self.w1*x[0] + self.w2*x[1] + self.b1)
                # d_h1_d_b1 = deriv_sigmoid(self.w1*x[0] + self.w2*x[1] + self.b1)
                #
                # # Neuron h2
                # d_h2_d_w3 = x[0]*deriv_sigmoid(self.w3*x[0] + self.w4*x[1] + self.b2)
                # d_h2_d_w4 = x[1]*deriv_sigmoid(self.w3*x[0] + self.w4*x[1] + self.b2)
                # d_h2_d_b2 = deriv_sigmoid(self.w3*x[0] + self.w4*x[1] + self.b2)

                # Neuron o1
                d_ypred_d_w5 = h1 * deriv_sigmoid(sum_o1)
                d_ypred_d_w6 = h2 * deriv_sigmoid(sum_o1)
                d_ypred_d_b3 = deriv_sigmoid(sum_o1)

                d_ypred_d_h1 = self.w5 * deriv_sigmoid(sum_o1)
                d_ypred_d_h2 = self.w6 * deriv_sigmoid(sum_o1)

                # Neuron h1
                d_h1_d_w1 = x[0] * deriv_sigmoid(sum_h1)
                d_h1_d_w2 = x[1] * deriv_sigmoid(sum_h1)
                d_h1_d_b1 = deriv_sigmoid(sum_h1)

                # Neuron h2
                d_h2_d_w3 = x[0] * deriv_sigmoid(sum_h2)
                d_h2_d_w4 = x[1] * deriv_sigmoid(sum_h2)
                d_h2_d_b2 = deriv_sigmoid(sum_h2)

                # 更新权值和偏置
                # Neuron h1
                self.w1 -= learn_rate*d_L_d_ypred*d_ypred_d_h1*d_h1_d_w1
                self.w2 -= learn_rate*d_L_d_ypred*d_ypred_d_h1*d_h1_d_w2
                self.b1 -= learn_rate*d_L_d_ypred*d_ypred_d_h1*d_h1_d_b1

                # Neuron h2
                self.w3 -= learn_rate*d_L_d_ypred*d_ypred_d_h2*d_h2_d_w3
                self.w4 -= learn_rate*d_L_d_ypred*d_ypred_d_h2*d_h2_d_w4
                self.b2 -= learn_rate*d_L_d_ypred*d_ypred_d_h2*d_h2_d_b2

                # Neuron o1
                self.w5 -= learn_rate*d_L_d_ypred*d_ypred_d_w5
                self.w6 -= learn_rate*d_L_d_ypred*d_ypred_d_w6
                self.b3 -= learn_rate*d_L_d_ypred*d_ypred_d_b3

            # 计算误差
            if epoch % 10 == 0:
                y_preds = np.apply_along_axis(self.feedforward, 1, data)
                loss = mse_loss(all_y_trues, y_preds)
                print("Epoch %d loss: %.3f" % (epoch, loss))


# 数据集定义
data = np.array([
    [-2, -1],  # Alice
    [25, 6],  # Bob
    [17, 4],  # Charlie
    [-15, -6],  # Diana
])
all_y_trues = np.array([
    1,  # Alice
    0,  # Bob
    0,  # Charlie
    1,  # Diana
])

# 模型训练
network = OurNeuralNetwork()
network.train(data, all_y_trues)

Epoch 0 loss: 0.383
Epoch 10 loss: 0.213
Epoch 20 loss: 0.125
Epoch 30 loss: 0.095
Epoch 40 loss: 0.077
Epoch 50 loss: 0.064
Epoch 60 loss: 0.054
Epoch 70 loss: 0.046
Epoch 80 loss: 0.041
Epoch 90 loss: 0.036
Epoch 100 loss: 0.032
Epoch 110 loss: 0.029
Epoch 120 loss: 0.026
Epoch 130 loss: 0.024
Epoch 140 loss: 0.022
Epoch 150 loss: 0.020
Epoch 160 loss: 0.019
Epoch 170 loss: 0.017
Epoch 180 loss: 0.016
Epoch 190 loss: 0.015
Epoch 200 loss: 0.014
Epoch 210 loss: 0.014
Epoch 220 loss: 0.013
Epoch 230 loss: 0.012
Epoch 240 loss: 0.012
Epoch 250 loss: 0.011
Epoch 260 loss: 0.011
Epoch 270 loss: 0.010
Epoch 280 loss: 0.010
Epoch 290 loss: 0.009
Epoch 300 loss: 0.009
Epoch 310 loss: 0.009
Epoch 320 loss: 0.008
Epoch 330 loss: 0.008
Epoch 340 loss: 0.008
Epoch 350 loss: 0.008
Epoch 360 loss: 0.007
Epoch 370 loss: 0.007
Epoch 380 loss: 0.007
Epoch 390 loss: 0.007
Epoch 400 loss: 0.006
Epoch 410 loss: 0.006
Epoch 420 loss: 0.006
Epoch 430 loss: 0.006
Epoch 440 loss: 0.006
Epoch 450 loss: 0.006
Epoch 460 loss: 0.006
Epoch 470 loss: 0.005
Epoch 480 loss: 0.005
Epoch 490 loss: 0.005
Epoch 500 loss: 0.005
Epoch 510 loss: 0.005
Epoch 520 loss: 0.005
Epoch 530 loss: 0.005
Epoch 540 loss: 0.005
Epoch 550 loss: 0.005
Epoch 560 loss: 0.004
Epoch 570 loss: 0.004
Epoch 580 loss: 0.004
Epoch 590 loss: 0.004
Epoch 600 loss: 0.004
Epoch 610 loss: 0.004
Epoch 620 loss: 0.004
Epoch 630 loss: 0.004
Epoch 640 loss: 0.004
Epoch 650 loss: 0.004
Epoch 660 loss: 0.004
Epoch 670 loss: 0.004
Epoch 680 loss: 0.004
Epoch 690 loss: 0.003
Epoch 700 loss: 0.003
Epoch 710 loss: 0.003
Epoch 720 loss: 0.003
Epoch 730 loss: 0.003
Epoch 740 loss: 0.003
Epoch 750 loss: 0.003
Epoch 760 loss: 0.003
Epoch 770 loss: 0.003
Epoch 780 loss: 0.003
Epoch 790 loss: 0.003
Epoch 800 loss: 0.003
Epoch 810 loss: 0.003
Epoch 820 loss: 0.003
Epoch 830 loss: 0.003
Epoch 840 loss: 0.003
Epoch 850 loss: 0.003
Epoch 860 loss: 0.003
Epoch 870 loss: 0.003
Epoch 880 loss: 0.003
Epoch 890 loss: 0.003
Epoch 900 loss: 0.003
Epoch 910 loss: 0.003
Epoch 920 loss: 0.003
Epoch 930 loss: 0.003
Epoch 940 loss: 0.002
Epoch 950 loss: 0.002
Epoch 960 loss: 0.002
Epoch 970 loss: 0.002
Epoch 980 loss: 0.002
Epoch 990 loss: 0.002

network.feedforward([-25,-6])

0.9660953631431157

WM Chen

130 original articles published · Like 30 · Visits 40,000+

Private letter concerns

Principle of neural network and its program realization