Neural network model
- The amount of data is getting larger, the data dimension is getting higher and higher (text, images), multi-source data
- It is difficult for other models to handle this multi-source, high-dimensional data. Most of the current artificial intelligence products use neural network models
Principles of neural networks
- Input layer: independent variable
- Hidden layer: intermediate state (feature extraction)
- Output layer: dependent variable
Feature extraction of hidden layers in convolutional neural networks
How neural networks produce output
Linear calculation method
Problems with linear models
Introduce nonlinearity
Activation function sigmoid
- The output of the sigmoid function is between 0 and 1
- It can be understood that it puts (−∞, + ∞)
- The number in the range is compressed to (0, 1) .
How the neural network achieves self-correction
Error function
Mean square error:
MSE Derivation:
Loss function
It is actually a multivariate function with multiple weights and offsets
Neural network self-adjustment:
How the error MSE becomes smaller: adjust the parameters,
How to adjust the parameters?
How to adjust the parameters?
How to find the minimum value of this function
w is on the right: w should decrease, the basis for judgment: the partial derivative is positive
w on the left: w should increase, the basis for judgment: the partial derivative is negative
Method: partial derivative ∂ L / ∂ w
Find partial derivatives of parameters to minimize MSE
More complex the function is: we can not determine the global minimum can only find local minimum
The whole process of parameter optimization
How to find the gradient of ownership value
According to the chain derivation rule
Then find ∂h1 / ∂w1
Derivative of sigmoid function (activation function):
Chain derivation process
Parameter update strategy
Stochastic gradient descent (SGD)
η is a constant, called the learning rate , which determines the speed of the training network.
The w1 minus [eta] · ∂L / ∂w1 , just wait until the new weight w1 .
When ∂ L / ∂w1 is positive, w1 will become smaller; when ∂ L / ∂w1 is negative, w1 will become larger.
Model training process
- Select a sample from the data set;
- Calculate the partial derivative of the loss function for weight and bias;
- Use the update formula to update each weight and offset;
- Back to step 1
All formulas
Forward conduction
Backward conduction
Parameter update
Neural network program implementation
- Select a sample from the data set;
- Calculate the partial derivative of the loss function for weight and bias;
- Use the update formula to update each weight and offset;
- Back to step 1
code show as below:
import numpy as np
def sigmoid(x):
# Sigmoid 激活函数: f(x) = 1 / (1 + e^(-x))
return 1 / (1 + np.exp(-x))
def deriv_sigmoid(x):
# sigmoid的倒数: f'(x) = f(x) * (1 - f(x))
fx = sigmoid(x)
return fx * (1 - fx)
def mse_loss(y_true, y_pred):
# MSE损失函数
return ((y_true - y_pred) ** 2).mean()
class OurNeuralNetwork:
def __init__(self):
# 权值矩阵
self.w1 = np.random.normal()
self.w2 = np.random.normal()
self.w3 = np.random.normal()
self.w4 = np.random.normal()
self.w5 = np.random.normal()
self.w6 = np.random.normal()
# 偏置
self.b1 = np.random.normal()
self.b2 = np.random.normal()
self.b3 = np.random.normal()
def feedforward(self, x):
# sum_h1 = self.w1 * x[0] + self.w2 * x[1] + self.b1
# h1 = sigmoid(sum_h1)
#
# sum_h2 = self.w3 * x[0] + self.w4 * x[1] + self.b2
# h2 = sigmoid(sum_h2)
#
# sum_o1 = self.w5 * h1 + self.w6 * h2 + self.b3
# o1 = sigmoid(sum_o1)
# y_pred = o1
# return y_pred
h1 = sigmoid(self.w1 * x[0] + self.w2 * x[1] + self.b1)
h2 = sigmoid(self.w3 * x[0] + self.w4 * x[1] + self.b2)
o1 = sigmoid(self.w5 * h1 + self.w6 * h2 + self.b3)
return o1
# x是输入数据,这里输入只有两个属性.
def train(self, data, all_y_trues):
learn_rate = 0.1
epochs = 1000 # 迭代次数
for epoch in range(epochs):
for x, y_true in zip(data, all_y_trues):
# feed forward
sum_h1 = self.w1 * x[0] + self.w2 * x[1] + self.b1
h1 = sigmoid(sum_h1)
sum_h2 = self.w3 * x[0] + self.w4 * x[1] + self.b2
h2 = sigmoid(sum_h2)
sum_o1 = self.w5 * h1 + self.w6 * h2 + self.b3
o1 = sigmoid(sum_o1)
y_pred = o1
# 计算偏导数.
# 对目标函数求导
d_L_d_ypred = -2*(y_true - y_pred)
# # Neuron o1
# d_ypred_d_w5 = h1 * deriv_sigmoid(self.w5 * h1 + self.w6 * h2 + self.b3)
# d_ypred_d_w6 = h2 * deriv_sigmoid(self.w5 * h1 + self.w6 * h2 + self.b3)
# d_ypred_d_b3 = deriv_sigmoid(self.w5 * h1 + self.w6 * h2 + self.b3)
#
# d_ypred_d_h1 = self.w5 * deriv_sigmoid(self.w5 * h1 + self.w6 * h2 + self.b3)
# d_ypred_d_h2 = self.w6 * deriv_sigmoid(self.w5 * h1 + self.w6 * h2 + self.b3)
#
# # Neuron h1
# d_h1_d_w1 = x[0]*deriv_sigmoid(self.w1*x[0] + self.w2*x[1] + self.b1)
# d_h1_d_w2 = x[1]*deriv_sigmoid(self.w1*x[0] + self.w2*x[1] + self.b1)
# d_h1_d_b1 = deriv_sigmoid(self.w1*x[0] + self.w2*x[1] + self.b1)
#
# # Neuron h2
# d_h2_d_w3 = x[0]*deriv_sigmoid(self.w3*x[0] + self.w4*x[1] + self.b2)
# d_h2_d_w4 = x[1]*deriv_sigmoid(self.w3*x[0] + self.w4*x[1] + self.b2)
# d_h2_d_b2 = deriv_sigmoid(self.w3*x[0] + self.w4*x[1] + self.b2)
# Neuron o1
d_ypred_d_w5 = h1 * deriv_sigmoid(sum_o1)
d_ypred_d_w6 = h2 * deriv_sigmoid(sum_o1)
d_ypred_d_b3 = deriv_sigmoid(sum_o1)
d_ypred_d_h1 = self.w5 * deriv_sigmoid(sum_o1)
d_ypred_d_h2 = self.w6 * deriv_sigmoid(sum_o1)
# Neuron h1
d_h1_d_w1 = x[0] * deriv_sigmoid(sum_h1)
d_h1_d_w2 = x[1] * deriv_sigmoid(sum_h1)
d_h1_d_b1 = deriv_sigmoid(sum_h1)
# Neuron h2
d_h2_d_w3 = x[0] * deriv_sigmoid(sum_h2)
d_h2_d_w4 = x[1] * deriv_sigmoid(sum_h2)
d_h2_d_b2 = deriv_sigmoid(sum_h2)
# 更新权值和偏置
# Neuron h1
self.w1 -= learn_rate*d_L_d_ypred*d_ypred_d_h1*d_h1_d_w1
self.w2 -= learn_rate*d_L_d_ypred*d_ypred_d_h1*d_h1_d_w2
self.b1 -= learn_rate*d_L_d_ypred*d_ypred_d_h1*d_h1_d_b1
# Neuron h2
self.w3 -= learn_rate*d_L_d_ypred*d_ypred_d_h2*d_h2_d_w3
self.w4 -= learn_rate*d_L_d_ypred*d_ypred_d_h2*d_h2_d_w4
self.b2 -= learn_rate*d_L_d_ypred*d_ypred_d_h2*d_h2_d_b2
# Neuron o1
self.w5 -= learn_rate*d_L_d_ypred*d_ypred_d_w5
self.w6 -= learn_rate*d_L_d_ypred*d_ypred_d_w6
self.b3 -= learn_rate*d_L_d_ypred*d_ypred_d_b3
# 计算误差
if epoch % 10 == 0:
y_preds = np.apply_along_axis(self.feedforward, 1, data)
loss = mse_loss(all_y_trues, y_preds)
print("Epoch %d loss: %.3f" % (epoch, loss))
# 数据集定义
data = np.array([
[-2, -1], # Alice
[25, 6], # Bob
[17, 4], # Charlie
[-15, -6], # Diana
])
all_y_trues = np.array([
1, # Alice
0, # Bob
0, # Charlie
1, # Diana
])
# 模型训练
network = OurNeuralNetwork()
network.train(data, all_y_trues)
Epoch 0 loss: 0.383 Epoch 10 loss: 0.213 Epoch 20 loss: 0.125 Epoch 30 loss: 0.095 Epoch 40 loss: 0.077 Epoch 50 loss: 0.064 Epoch 60 loss: 0.054 Epoch 70 loss: 0.046 Epoch 80 loss: 0.041 Epoch 90 loss: 0.036 Epoch 100 loss: 0.032 Epoch 110 loss: 0.029 Epoch 120 loss: 0.026 Epoch 130 loss: 0.024 Epoch 140 loss: 0.022 Epoch 150 loss: 0.020 Epoch 160 loss: 0.019 Epoch 170 loss: 0.017 Epoch 180 loss: 0.016 Epoch 190 loss: 0.015 Epoch 200 loss: 0.014 Epoch 210 loss: 0.014 Epoch 220 loss: 0.013 Epoch 230 loss: 0.012 Epoch 240 loss: 0.012 Epoch 250 loss: 0.011 Epoch 260 loss: 0.011 Epoch 270 loss: 0.010 Epoch 280 loss: 0.010 Epoch 290 loss: 0.009 Epoch 300 loss: 0.009 Epoch 310 loss: 0.009 Epoch 320 loss: 0.008 Epoch 330 loss: 0.008 Epoch 340 loss: 0.008 Epoch 350 loss: 0.008 Epoch 360 loss: 0.007 Epoch 370 loss: 0.007 Epoch 380 loss: 0.007 Epoch 390 loss: 0.007 Epoch 400 loss: 0.006 Epoch 410 loss: 0.006 Epoch 420 loss: 0.006 Epoch 430 loss: 0.006 Epoch 440 loss: 0.006 Epoch 450 loss: 0.006 Epoch 460 loss: 0.006 Epoch 470 loss: 0.005 Epoch 480 loss: 0.005 Epoch 490 loss: 0.005 Epoch 500 loss: 0.005 Epoch 510 loss: 0.005 Epoch 520 loss: 0.005 Epoch 530 loss: 0.005 Epoch 540 loss: 0.005 Epoch 550 loss: 0.005 Epoch 560 loss: 0.004 Epoch 570 loss: 0.004 Epoch 580 loss: 0.004 Epoch 590 loss: 0.004 Epoch 600 loss: 0.004 Epoch 610 loss: 0.004 Epoch 620 loss: 0.004 Epoch 630 loss: 0.004 Epoch 640 loss: 0.004 Epoch 650 loss: 0.004 Epoch 660 loss: 0.004 Epoch 670 loss: 0.004 Epoch 680 loss: 0.004 Epoch 690 loss: 0.003 Epoch 700 loss: 0.003 Epoch 710 loss: 0.003 Epoch 720 loss: 0.003 Epoch 730 loss: 0.003 Epoch 740 loss: 0.003 Epoch 750 loss: 0.003 Epoch 760 loss: 0.003 Epoch 770 loss: 0.003 Epoch 780 loss: 0.003 Epoch 790 loss: 0.003 Epoch 800 loss: 0.003 Epoch 810 loss: 0.003 Epoch 820 loss: 0.003 Epoch 830 loss: 0.003 Epoch 840 loss: 0.003 Epoch 850 loss: 0.003 Epoch 860 loss: 0.003 Epoch 870 loss: 0.003 Epoch 880 loss: 0.003 Epoch 890 loss: 0.003 Epoch 900 loss: 0.003 Epoch 910 loss: 0.003 Epoch 920 loss: 0.003 Epoch 930 loss: 0.003 Epoch 940 loss: 0.002 Epoch 950 loss: 0.002 Epoch 960 loss: 0.002 Epoch 970 loss: 0.002 Epoch 980 loss: 0.002 Epoch 990 loss: 0.002
network.feedforward([-25,-6])
0.9660953631431157