Deep Learning Tips 00--Working Principle and Simple Code Implementation of Neurons


This series of blogs is used to record the process of learning deep learning.

1. The basic model of neurons

basic neuron model
As shown in the figure is a basic model of a neuron. The main steps of a neuron during training are divided into blue lines (forward propagation) and red lines (backpropagation – gradient descent). Convention symbols: X11 represents the first eigenvalue of the first sample, X1 represents the first sample, Y1 represents the actual value of the first sample, A1 represents the predicted value of the first sample, W1 represents the value added to the first sample The weight of the feature value, W=[W1,W2,…,Wn]

(1) Forward propagation

As shown in the figure above, the process of forward propagation is divided into 3 or 4 steps (calculation of average loss value).
1. Assuming that the neuron has N eigenvalue inputs, a linear relationship (N+1-dimensional linear hyperplane) is constructed by initializing W and B.
2. Perform a second mapping (generally non-linear) on the above relationship to obtain the predicted value A. (Under the action of the activation function)
3. Evaluate the distance between the predicted value and the actual value through the loss function. (Similar to the role of variance in statistics)
4. Calculate the distance between the predicted value and the actual value after all samples are transformed by W and B and pass the activation function, and take the average as a reference. (In the following training, the value should be smaller and smaller under normal circumstances)

(2) Backpropagation – Gradient Descent Method

In the forward propagation of the entire neuron, we can analyze that the average loss J can be expressed as the following form. J=F(L), J=F(A,Y), J=F(Z,Y), J=F(W,B,Y). So we can see that the loss function is fundamentally related to the values ​​​​of W and B. In the training of the neural network, our purpose is to reduce the distance between the actual value and the predicted value, which means that we want to reduce the obtained J value.
insert image description here
Assuming that the image of the loss function is y=x² as above, that is, y=F(x), we can derive the derivative of y to x by deriving y to x. According to the definition of the derivative, the derivative is a straight line tangent to the image. When When the derivative is positive, that is, x=x+derivative multiplied by a positive number, the y value increases, and x=x+derivative multiplied by a negative number, the y value decreases. So we can update its parameter W through the loss function L, that is, W= W+dL/dW α (α<0) to update its parameter B, that is, B= B+ dL/dB α (α<0), you can The value of the loss function is reduced, so as to achieve the purpose of reducing the distance between the predicted value and the actual value we want. And dL/dW, dL/dB can be obtained through the red line in the neuron model above, that is, when designing neurons, first initialize dL/dA, dA/dZ, dZ/dW, dZ/dB (these are in the whole process Invariant expression, so it can be easily designed), and then dynamically adjust W and B through the input of the sample.

PS: W and B are updated once in a training session, so the design is to use the average value of dW of all samples as the update of W, and the average value of dB of all samples as the update of B.

(3) Why is it necessary to perform linear fitting and then do nonlinear mapping (excitation function)?

insert image description here
As shown in the figure above, if a non-linear function is directly used, the sensitivity of the function value change to the parameter change is very low in some intervals, which will lead to the phenomenon of gradient disappearance or slow gradient decline.

If only linear functions are used, the actual problems in reality cannot be well fitted (generally, practical problems will not have a linear relationship). On the other hand, deep learning obtains features and The deep-level relationship between the actual value Y, if each neural layer is linear, then there may be essentially no difference between a high-depth neural network and a low-depth neural network.

Two, python code implementation

basic neuron

class basic_neuron:

    def __init__(self, W, X, B, activity_func, dA_dZ, loss_func, Y, alfa, print_accuracy=True):
        """
        设count(X) = n, Xn = m,即特征值个数为m,样本数为n
        :param W: shape == 1*Xn
        :param X: shape = Xn*n
        :param B: shape = 1*Xn
        :param activity_func: 激活函数 返回激活函数计算后的值A
        :param dA_dZ 返回 dA(激活函数)/dZ 的 函数
        :param loss_func: 损失函数 返回用于梯度下降的dW,dB,accuracy,total_loss
        :param Y: 实际值
        :param alfa:学习率
        :param print_accuracy:是否打印准确度
        """
        self.W = W.astype('float64')
        self.X = X.astype('float64')
        self.B = B
        self.activity_fun = activity_func
        self.loss_func = loss_func
        self.Y = Y.astype('float64')
        self.alfa = -alfa
        self.print_accuracy = print_accuracy
        self.dA_dZ = dA_dZ

    def change_param(self, dW, dB):
        self.W += self.alfa * dW
        self.B += self.alfa * dB

    def calculate_yp(self):
        return self.activity_fun(np.dot(self.W, self.X) + self.B)

    def run(self):
        YP = self.calculate_yp()
        dA, accuracy, total_loss = self.loss_func(YP, self.Y)
        Z = np.dot(self.W, X) + self.B
        dZ = np.multiply(dA, self.dA_dZ(Z))
        dW = np.dot(dZ, self.X.T) / self.X.shape[1]
        dB = dZ.sum(axis=1, keepdims=True) / self.X.shape[1]
        # print('/*------------------------------*/')
        # print('YP:',YP)
        # print('dW:',dW,'dB:',dB)
        # print('Loss:',total_loss)
        # print('/*------------------------------*/')
        self.change_param(dW, dB)
        if self.print_accuracy:
            print('accuracy:', accuracy)
        print('Loss:', total_loss)

    def predict(self, X):
        return self.activity_fun(np.dot(self.W, X) + self.B)

Basic Loss and Activation Functions

def activity_func(Z):
    # sigmoid函数
    yp = 1 / (1 + np.exp(-1 * Z))
    return yp


def dA_dZ(Z):
    # 返回sigmoid函数的导数
    return np.multiply((1 / (1 + np.exp(-1 * Z))), (1 - (1 / (1 + np.exp(-1 * Z)))))


def loss_func(YP, Y):
    size = YP.shape[1]  # 得到YP的数量
    Y_arr = Y.tolist()
    YP_arr = YP.tolist()
    # 避免后续求dA出现除0现象
    temp = np.where(YP == 1)
    for index, item in enumerate(temp[0]):
        YP[item, temp[1][index]] = 0.999999999999

    dA = -(np.true_divide(Y, YP) + np.true_divide((Y - 1), (1 - YP)))
    loss_part1 = np.dot(Y, np.log(YP).T)
    loss_part2 = np.dot(1 - Y, np.log(1 - YP).T)
    loss = -(loss_part1 + loss_part2)
    count = 0
    for index, item in enumerate(Y_arr[0]):
        if abs(item - YP_arr[0][index]) < 0.5:
            count += 1
    accuracy = count / size
    return dA, accuracy, float(loss) / size

run code

if __name__ == '__main__':
    # 生成训练集,X1<100,X2>100为1类,X1>100,X2<100为1类
    X_list = []
    for i in range(200):
        if i < 100:
            temp1 = random.random() * 100
            temp2 = 100 + random.random() * 100
        else:
            temp1 = 100 + random.random() * 100
            temp2 = random.random() * 100
        X_list.append([temp1, temp2])
    Y_list = []
    for i in range(200):
        Y_list.append(1 if i < 100 else 0)
    # 初始化参数
    W = np.array([[0, 0]]).astype('float64').reshape(1, 2)
    B = 0
    X = np.array(X_list).T
    Y = np.array(Y_list).reshape(1, 200)
    alfa = 0.01
    # 生成神经元
    neuron = basic_neuron(W, X, B, activity_func, dA_dZ, loss_func, Y, alfa)
    # 迭代训练
    for i in range(3):
        neuron.run()
    # 进行检测
    pre = neuron.predict(np.array([[120], [30]]).reshape([2, 1]))
    print(pre)

Guess you like

Origin blog.csdn.net/YmgmY/article/details/107314522