BackPropagation neural network learning and Analysis

Foreword

     Last year, sophomore year, I suddenly want to try neural network learning and deep learning. After all, bloggers want to do is computer vision later, he looked my mentor smell a bit related to learning methods, so the teacher gave me a judge ranch animal behavior to predict project. To be honest, at first my heart is still quite off the mark, after all, I consequently will not ah ToT. But people are not qualitative change from 0-1 to enhance it? So I bought a bunch of books. Here I am more recommended

  1. 'China Water Power Press' Mr. Liu Yu's "Python Programming"
  2. 'China Water Power Press' Mr. Jiang Ziyang's "TensorFlow deep learning algorithms and programming principles of combat."
  3. 'People Post Press' Mr. Kang Yi Saito "deep learning portal"
  4. 'People Post Press' Mr. Tariq Rashid "Python neural network programming."
  5. 'People Post Press' IAN GOODFELLOW, YOSHUA BENGIO, AARON COURVILLE with the three co-wrote "DEEP LEARNING"

     About video course, I recommend to the B station Mo trouble Python, Mr. Andrew Ng video, and 3Blue1Brown and so on.

One mounting for various library

     In fact, I have a lot of blog which said, python blog or before if the library does not install your own BaiDu students or read to me.

import numpy as np#numpy做矩阵运算十分方便
import math#math函数用来设置一些激活函数
import random#需要随机数来做初始权重
import string
import matplotlib as mpl#如果你需要作图就用上
import matplotlib.pyplot as plt
import sys#需要保存最后的合理权重
import xlrd#因为我的数据集是从excel里面读出来的
from sklearn import preprocessing#需要对数据集做高斯分布或归一化的可以用这个

Read data

     I have here is the use of a data set from inside Excel regard to the ranch, if it is a friend or MNIST data sets from other sources can skip this step. Almost a three-dimensional data, every time I take from two types of behavior which 500 sets of data into an array, and give them labeled 0 or 1. [[[954], [4652], [--456] [1] ]] is almost this, in front of three-dimensional list, followed by the label attached.

def read_excel():
    # 打开Excel
    workbook = xlrd.open_workbook('all.xls')
    # 进入sheet
    eating = workbook.sheet_by_index(0)
    sleeping=workbook.sheet_by_index(1)
    # 获取行数和列叔
    Srows_num=sleeping.nrows
    Scols_num=sleeping.ncols
    ListSleep=[]
    
    Erows_num = eating.nrows
    Ecols_num = eating.ncols
    ListEat=[]
    ListEatT=[]
    ListSleepT=[]
    ListT=[]
    i = 1
    while i <= 500:
        if Erows_num > 10 or Srows_num>10:
            # 生成随机数
            random_numE = random.randint(1, Erows_num - 1)
            E_X = eating.row(random_numE)[1].value
            E_Y = eating.row(random_numE)[3].value
            E_Z = eating.row(random_numE)[5].value
            ListEat.append([[int(E_X),int(E_Y),int(E_Z)],[1]])
            random_numS = random.randint(1, Srows_num - 1)
            S_X = sleeping.row(random_numS)[1].value
            S_Y = sleeping.row(random_numS)[3].value
            S_Z = sleeping.row(random_numS)[5].value
            ListSleep.append([[int(S_X),int(S_Y),int(S_Z)],[0]])
            i = i + 1
        else:
            print("数据不足,请添加")
            break
    List=ListEat+ListSleep
    
    print(List)
read_excel()

II. Activation Function

      Here is the most important step, apply a good quality of your decision to activate the function of this network.

      1 , the Sigmoid function

       sigmoid function curve as follows:

         sigmoid activation function, realistic, when the input value is small, the output is close to 0; when the input value is large, the output value is close to 1. But sigmoid activation function has a larger disadvantage is mainly two things:

(1) prone gradient disappears. When the input value is small or large, the gradient tends to 0, the function corresponding to left and right ends derivative function curve towards zero.

(2) the center of the non-zero, will affect the dynamics of the gradient descent.

def sigmoid(x):
    '''
    定义sigmoid函数
    '''
    return 1.0/(1.0+np.exp(-x))
def derived_sigmoid(x):
    '''
    定义sigmoid导函数
    '''
    return sigmoid(x)*(1-sigmoid(x))

     2, tanh function

     tanh function curve as follows:

     Compared with the Sigmoid, the output becomes 0 to the center of the range of [-1, 1]. But the disappearance of the gradient still exists.

def tanh(x):
    '''
    定义tanh函数
    '''
    return (np.exp(x)-np.exp(-x))/(np.exp(x)+np.exp(-x))
def derived_tanh(x):
    '''
    定义tanh导函数
    '''
    return 1-tanh(x)*tanh(x)

     3, Relu function

     The correction unit is a linear Relu has many advantages, the neural network is the most used activation function.

     Function curve as follows:

Advantages: (1) gradient will not disappear faster convergence;

          (2) prior to the calculation amount is small, need only count max (0, x), there is not sigmoid index calculation;

          (3) back propagation calculations fast, simple derivative calculation, no index is calculated starting;

          (4) Some neurons is 0, so that the network has saprse properties, over-fitting can be reduced.

Disadvantages: (1) relatively vulnerable to "death" in training, back propagation if the argument is 0, the latter parameter will not be updated. Use appropriate learning rate will weaken the case.

def relu(x):
    '''relu函数'''
    return np.where(x<0,0,x)
 
def derivedrelu(x):
    '''relu的导函数'''
    return np.where(x<0,0,1)

     4, Leak Relu function

     Leak Relu Relu is an improvement over the shortcomings, when the input value is less than 0, the output value of [alpha] x, where α is a small constant. Such circumstances "die" is not prone to reverse the spread.

def leakyrelu(x,a=0.01):
    '''
    定义leakyrelu函数
    leakyrelu激活函数是relu的衍变版本,主要就是为了解决relu输出为0的问题
    '''
    return np.where(x<0,a*x,x)
 
def derived_leakyrelu(x,a=0.01):
    '''
    定义leakyrelu导函数
    '''
    return np.where(x<0,a,1)
 
 

     4, elu function

 # 定义elu函数elu和relu的区别在负区间,relu输出为0,而elu输出会逐渐接近-α,更具鲁棒性。
def elu(x,a=0.01):
 #elu激活函数另一优点是它将输出值的均值控制为0(这一点确实和BN很像,BN将分布控制到均值为0,标准差为1)
    return np.where(x<0,a*(np.exp(x)-1),x)
 
def derived_elu(x,a=0.01):
    '''
    定义elu导函数
    '''
    return np.where(x<0,a*np.exp(x),1)

 In addition, there are many derived activation function, please yourself excavated.

Three. A normalized data processing

  Data input and output pre-processing: scaling. Also referred to as scale changes normalization or standardization, the conversion processing refers to the input and output data network is limited to [0,1] or [1,1] interval and the like. There are three reasons transformation:

 1). Respective input data networks often have different meaning and different physical dimensions. Scaling all components are varied within a range, so that the network training begins by putting each of equal importance to the input components;

 . 2) on BP neural network neurons are used sigmoid function, the conversion can be prevented due to too large an absolute value of the net input of the neuron output saturated, so that weight adjustment then enters the planar error surface region;

 . 3). Sigmoid function outputs in [1,1] within the interval [0,1] tanh function or, if not desired output data conversion process is bound to make the absolute value of error is large component, small absolute error value component small, short, the best data to be normalized.

2, (0,1) Standardization:

  This is the simplest and easiest conceivable way through each data traversal feature vector where the Max and Min of the record, by Max-Min as the base (i.e., Min = 0, Max = 1) normalized data a treatment:

                     

Python implementation:

def MaxMin(x,Max,Min):
    x = (x - Min) / (Max - Min);
    return x

2, Z-score normalized:

This method of administering the raw data average (mean) and standard deviation (standard deviation) of the standardized data. Processed data a standard normal distribution, a mean of 0 and standard deviation 1, a composite key here is that the standard normal distribution:

 

                     

Python implementation:

def Z_Score(x,mu,sigma):
    x = (x - mu) / sigma;
    return x

3, mean normalization

In two ways, the normalization method as denominator max max-min and denominator in the normalization method

 

def average():
    # 均值
    average = float(sum(data))/len(data)
 
    # 均值归一化方法
    data2_1 = [(x - average )/max(data) for x in data]
    data2_2 = [(x - average )/(max(data) - min(data)) for x in data]

Store. Error function / loss function

     A mean square error Mean Squared Error

yk is the output of the NN
tk is a label, the label is only one correct answer and the others are 0. k is the dimensionality of the data
so that the mean square error of the squared difference is calculated for each element of the NN outputs the correct data of the tag and , and then seek the sum.

def get_standard_deviation(records):
    """
    标准差 == 均方差 反映一个数据集的离散程度
    """
    variance = get_variance(records)
    return math.sqrt(variance)

     Second, the average Average

def get_average(records):
    """
    平均值
    """
    return sum(records) / len(records)

    Third, the cross entropy error Cross Entropy Error

Cross-entropy error calculating individual training data outputs only the correct solution of the natural logarithm of the label.

yk is the output of the NN tk label, the label is only one correct solution, the other are 0, k is the dimensionality

Therefore, when the probability of correct solution NN output y is 1, cross entropy loss is 0; y as the correct solution probability decreases to 0, the cross-entropy loss is decreased (reduced to a negative value, the value of loss is actually the increase)

def cross_entropy(a, y):
    return np.sum(np.nan_to_num(-y*np.log(a)-(1-y)*np.log(1-a)))
 
# tensorflow version
loss = tf.reduce_mean(-tf.reduce_sum(y_*tf.log(y), reduction_indices=[1]))
 
# numpy version
loss = np.mean(-np.sum(y_*np.log(y), axis=1))

 

Kivu. To build neural networks

   This figure above is just an example, my last point output only one output, to judge by the behavior of the judge close to 0 or 1, multi-classification recommendations Finally softmax. Initial weights here I need to say something, it is best to set a multiple of 2, and in particular to GPU-accelerated students.
 

# 构造三层BP网络架构
class BPNN:
    def __init__(self, num_in, num_hidden, num_out):
        # 输入层,隐藏层,输出层的节点数
        self.num_in = num_in + 1  # 增加一个偏置结点
        self.num_hidden = num_hidden + 1  # 增加一个偏置结点
        self.num_out = num_out
        # 激活神经网络的所有节点(向量)
        self.active_in = [1.0] * self.num_in
        self.active_hidden = [1.0] * self.num_hidden
        self.active_out = [1.0] * self.num_out
        # 创建权重矩阵
        self.wight_in = makematrix(self.num_in, self.num_hidden)
        self.wight_out = makematrix(self.num_hidden, self.num_out)
        # 对权值矩阵赋初值

        for i in range(self.num_in):
            for j in range(self.num_hidden):
                self.wight_in[i][j] = random_number(-2.4, 2.4)
        for i in range(self.num_hidden):
            for j in range(self.num_out):
                self.wight_out[i][j] = random_number(-0.2, 0.2)
        # 最后建立动量因子(矩阵)
        self.ci = makematrix(self.num_in, self.num_hidden)
        self.co = makematrix(self.num_hidden, self.num_out)

     Forward propagation, here I used the sigmoid as the normalized initial set of data, because the data I have here the maximum and minimum too bad this gap and 10003, as there is too much difference between this data must be Normalized. tanh activation function as a hidden layer, the data in the hidden layer are unified between [-1,1]. Finally, this data is determined by sigmoid output close to 0 or close to 1.

   def update(self, inputs):
        print(len(inputs))
        if len(inputs) != self.num_in - 1:
            raise ValueError('输入层节点数有误')
            # 数据输入输入层
        for i in range(self.num_in - 1):
            self.active_in[i] = sigmoid(inputs[i])  #在输入层进行数据处理归一化

            self.active_in[i] = inputs[i]  # active_in[]是输入数据的矩阵
            # 数据在隐藏层的处理
        for i in range(self.num_hidden - 1):
            sum = 0.0
            for j in range(self.num_in):
                sum = sum + self.active_in[i] * self.wight_in[j][i]

            self.active_hidden[i] = tanh(sum)  # active_hidden[]是处理完输入数据之后存储,作为输出层的输入数据

            # 数据在输出层的处理
        for i in range(self.num_out):
            sum = 0.0
            print(self.wight_out)
            for j in range(self.num_hidden):

                sum = sum + self.active_hidden[j] * self.wight_out[j][i]
            self.active_out[i] = sigmoid(sum)  # 与上同理
        return self.active_out[:]
 

     Back Propagation

# 误差反向传播
    def errorback(self, targets, lr, m):  # lr是学习率, m是动量因子
        if len(targets) != self.num_out:
            raise ValueError('与输出层节点数不符!')
            # 首先计算输出层的误差
        out_deltas = [0.0] * self.num_out
        for i in range(self.num_out):
            error = targets[i] - self.active_out[i]
            out_deltas[i] = derived_sigmoid(self.active_out[i]) * error

            # 然后计算隐藏层误差
        hidden_deltas = [0.0] * self.num_hidden
        for i in range(self.num_hidden):
            error = 0.0
            for j in range(self.num_out):
                error = error + out_deltas[j] * self.wight_out[i][j]
            hidden_deltas[i] = derived_tanh(self.active_hidden[i]) * error

            # 首先更新输出层权值
        for i in range(self.num_hidden):
            for j in range(self.num_out):
                change = out_deltas[j] * self.active_hidden[i]
                self.wight_out[i][j] = self.wight_out[i][j] + lr * change + m * self.co[i][j]
                self.co[i][j] = change
            # 然后更新输入层权值
        for i in range(self.num_in):
            for j in range(self.num_hidden):
                change = hidden_deltas[j] * self.active_in[i]
                self.wight_in[i][j] = self.wight_in[i][j] + lr * change + m * self.ci[i][j]
                self.ci[i][j] = change
            # 计算总误差
        error = 0.0
        for i in range(len(targets)):
            error = error + 0.5 * (targets[i] - self.active_out[i]) ** 2

        return error

j [0] to enter the list of three-dimensional, j [1] is the learning rate for the label of each list, lr, because with the decline in the gradient, the learning rate is also reduced as the best, just like a person learning training, the beginning is much to learn, and finally to learn the fine.

 

Improved BP algorithm may be used MOMENTUMMETHOD m
momentum method was introduced in the weight update phase standard BP algorithm momentum factor α (0 <α <1) , that the value of the correction weight with a certain inertia, i.e. error which can not substantially change adjustment of the signal value returns to a small but greater than the total error energy and the total error condition can result training set. This time was added a momentum factor which contributes to the feedback error signal causes the value of neuron weights up oscillations again. In the original weight adjustment formula, the momentum factor, and the added weight on the amount of change in time. It indicates that the update added momentum the direction and magnitude of this weight value, calculating a gradient obtained with only this, but also with the last update related to direction and magnitude. Momentum reflects adjustments of previously accumulated experience, adjusted for time t played a dampening effect. When the error surface sudden fluctuation occurs, the shock can be reduced tendency to improve the training speed.

 

  def train(self, pattern, itera=200000, lr=0.1, a=0.5):
        for i in range(itera):
            error = 0.0
            for j in pattern:
                input = j[0]
                targ = j[1]
                self.update(input)
                error = error + self.errorback(targ, lr, m)
            if i % 200 == 0:
                lr*=0.99
                print('误差 %-.5f' % error)
 Record the final weight saved in txt document
    def weights(self):
        f = open("getwights.txt", "w")
        weightin = []
        weightout = []
        print("输入层权重")
        for i in range(self.num_in):
            print(self.wight_in[i])
            weightin.append(self.wight_in[i])

        print("输入矩阵", weightin)
        print("输出层权重")
        for i in range(self.num_hidden):
            print(self.wight_out[i])
            weightout.append(self.wight_out[i])

        print("输出矩阵", weightout)
        f.write(str(weightin))
        f.write("\n")
        f.write(str(weightout))
        f.close()

Then create a neural network, if you are not 3X3X1 type, can also modify their own, and then modify it in class the same number of nodes which can be used

def BP():
# 创建神经网络,3个输入节点,3  个隐藏层节点,1个输出层节点
    n = BPNN(3, 3, 1)
    # 训练神经网络
    n.train(List)
    # 保存权重值
    n.weights()
if __name__ == '__main__':
    BP()

Now that we have the right to hold the heavy training after the last course to use. And in fact is the same as before, is the initial weight is no longer a random number but our previously saved, then only forward propagation.
 

class BPNN:
    def __init__(self, num_in, num_hidden, num_out):
        # 输入层,隐藏层,输出层的节点数
        self.num_in = num_in + 1  # 增加一个偏置结点
        self.num_hidden = num_hidden + 1  # 增加一个偏置结点
        self.num_out = num_out
        # 激活神经网络的所有节点(向量)
        self.active_in = [1.0] * self.num_in
        self.active_hidden = [1.0] * self.num_hidden
        self.active_out = [1.0] * self.num_out
        # 创建权重矩阵
        self.wight_in = makematrix(self.num_in, self.num_hidden)
        self.wight_out = makematrix(self.num_hidden, self.num_out)
        # 对权值矩阵赋初值
        f = open("getwights.txt", "r")
        a = f.readline()
        b = f.readline()
        listone = eval(a)
        listtwo = eval(b)
        f.close()
        for i in range(len(listone)):
            for j in range(len(listone[i])):
                self.wight_in[i][j] = listone[i][j]
        for i in range(len(listtwo)):
            for j in range(len(listtwo[i])):
                self.wight_out[i][j] = listtwo[i][j]

        self.co = makematrix(self.num_hidden, self.num_out)
        # 信号正向传播

    def update(self, inputs):
        if len(inputs) != self.num_in - 1:
            raise ValueError('与输入层节点数不符')
        # 数据输入输入层
        for i in range(self.num_in - 1):
            # self.active_in[i] = sigmoid(inputs[i])  #或者先在输入层进行数据处理
            self.active_in[i] = inputs[i]  # active_in[]是输入数据的矩阵
        # 数据在隐藏层的处理
        for i in range(self.num_hidden - 1):
            sum = 0.0
            for j in range(self.num_in):
                sum = sum + self.active_in[i] * self.wight_in[j][i]
            self.active_hidden[i] = tanh(sum)  # active_hidden[]是处理完输入数据之后存储,作为输出层的输入数据
        # 数据在输出层的处理
        for i in range(self.num_out):
            sum = 0.0
            for j in range(self.num_hidden):
                sum = sum + self.active_hidden[j] * self.wight_out[j][i]
            self.active_out[i] = sigmoid(sum)  # 与上同理
        return self.active_out[:]
    # 测试
    def test(self, patterns):
        distingish=[]
        List_update=[]
        num=0
        sum=0
        for i in patterns:
            List_update.append(self.update(i[0]))
        sum=np.sum(List_update,axis=0)
        average=sum/(len(patterns))
        #print(average)
        for i in patterns:
            num += 1
            if (num <= testnums and self.update(i[0]) > average):
                distingish.append(True)
            if (num > testnums and self.update(i[0]) < average):
                distingish.append(True)
            print(i[0], '->', self.update(i[0]))
        print("正确率", len(distingish) / num)


# 实例
def BP():

    # 创建神经网络,3个输入节点,3个隐藏层节点,1个输出层节点
    n = BPNN(3, 3, 1)
    # 测试神经网络
    n.test(ListT)
if __name__ == '__main__':
    BP()

 After the order

I wrote an article about Pyecharts is about data visualization, friends who are interested can go and see

https://blog.csdn.net/weixin_43341045/article/details/104137445

About the source of the data ah, it can be very many, I am here to introduce reptiles. Reptiles there are many libraries, wrote a reptile on the Selenium https://blog.csdn.net/weixin_43341045/article/details/104014416 .

9 there is a stop line crawling column B picture (BeautifulSoup4)

https://blog.csdn.net/weixin_43341045/article/details/104411456

     It was only a mediocre sophomore scholarship shallow out to write an expert Opinion gather at this country, but also hope you forgive a lot of pointing senior expert. We invite the country to join our study group exchange code: 871 352 155 (whether you C / C ++ or Java, Python or PHP ...... We are interested you are welcome to join, but also please fill out plus group information within the group is currently mostly college students, advertise, Mr. President, please do not pereiopod up. we hope to have visionary seniors can make recommendations to guide the direction of early calf about to enter the community.)

 

 

 

 

Published 13 original articles · won praise 39 · views 10000 +

Guess you like

Origin blog.csdn.net/weixin_43341045/article/details/105243732