2023 Huashu Cup Modeling Ideas - Case: Analysis and Implementation of Perceptron Principles

# Ideas for competition questions

(Share on CSDN as soon as the competition questions come out)

https://blog.csdn.net/dc_sinor?type=blog

1. Intuitive understanding of perceptron

The perceptron should be one of the simplest algorithms in machine learning algorithms. Its principle can be seen in the figure below:

insert image description here

For example, we have a coordinate axis (the black line in the figure), the horizontal x1 axis and the vertical x2 axis. Every point in the graph is determined by (x1, x2). If we apply this picture to judge whether a part is qualified, x1 indicates the length of the part, x2 indicates the quality of the part, and the coordinate axis indicates the average length and average weight of the part, and the blue ones are qualified products, and the yellow ones are inferior products, which need to be eliminated . Then it is obvious that if the length and weight of the part are greater than the average, it means that the part is qualified. That is, all blue points in the first quadrant. Conversely, if both items are less than the average value, it is inferior, such as the yellow point in the third quadrant.

It is very simple in prediction. When we get a new part, we measure its length x1 and mass x2. If both items are greater than the average value, the part is qualified. This is our human artificial intelligence.

So how does the program know that parts whose length and weight are greater than the average are qualified?
Or

How did it learn this rule?
What the program gets is the information and labels of all points in the current graph, that is to say, it knows that the coordinates of all samples x are (x1, x2), and it belongs to blue or yellow. For the points I have at hand, it would be nice if I could find a straight line to separate them, so that I get a new part, and knowing its mass and weight, I can tell which side of the line it is, and I can Know that it may belong to good or bad parts. For example, the yellow, blue, and pink lines in the picture can perfectly separate the two current situations. Even the x1 coordinate axis or the x2 coordinate axis can become a dividing line (both lines can separate all points correctly).

Readers have also seen that for the two piles of points in the figure, we have countless straight lines that can divide them. In fact, we not only need to be able to divide the current point, but when new points come in, we must also be able to divide them well. It's divided, so which line is best?

How does a straight line belong to the best dividing line? In fact, the perceptron cannot find an optimal straight line. What it finds may be all the lines drawn in the picture, as long as all the points can be separated.

Draw a conclusion:
If a straight line can not miss a point, it is a good straight line
. Further:

If we sum the distances between all misclassified points and the straight line, and make this summation the smallest example (preferably 0, which means that there are no misclassified points), this straight line is what we are looking for.

Second, the mathematical perspective of the perceptron

First of all, let's determine the ultimate goal: regardless of the messy steps in the middle of finding the best dividing line, anyway, a function f(x) is finally generated. When we throw a new data x into the function, it will predict and tell me that this is Blue or yellow, how easy it is. So let's not think about the intermediate process, but decide the result first.

insert image description here

Look, didn't f(x) come out, what is sign? What is wx+b? Don't worry, let's take a look at what the sigin function is.

insert image description here

sign seems to be very simple. When x is greater than or equal to 0, sign outputs 1, otherwise it outputs -1. Then recurse forward, if wx+b is greater than or equal to 0, f(x) is equal to 1, otherwise f(x) is equal to -1.

So what is wx+b?
It is the optimal straight line. Let's look at this formula in a two-dimensional situation. The straight line in two dimensions is defined as follows: y=ax+b. In two dimensions, w is simply a, b or b. So wx+b is a straight line (such as the blue line in the picture at the beginning of this article). If the new point x is on the left side of the blue line, then wx+b<0, then after sign, and finally f outputs -1, if it is on the right side, output 1. Wait, it seems a little unreasonable. Equivalent the situation to a two-dimensional plane, y=ax+b, as long as the point is above the x-axis, regardless of the left and right sides of the point line, the final result is greater than 0. This What does it have to do with line? emmm... In fact, wx+b and ax+b express the same form of straight line, but there are slight differences. We rotate the top picture 45 degrees counterclockwise, does the blue line become the x-axis? Haha, does this mean that the right side of the original blue line has become the top of the x-axis? In fact, when the perceptron calculates the line wx+b, it has already been converted secretly, so that the straight line used for division becomes the x-axis, and the left and right sides are respectively above and below the x-axis, which becomes positive and negative.

So, why is it wx+b instead of ax+b?
In this article, parts are used as an example. The length and weight (x1, x2) are used above to represent the properties of a part, so a two-dimensional plane is enough. What if the quality and color of the part are also related? Then you have to add an x3 to represent the color, and the attribute of the sample becomes (x1, x2, x3), which becomes three-dimensional. wx+b is not only used in the two-dimensional case, in the three-dimensional case, this formula can still be used. So wx+b and ax+b are only approximately the same in two dimensions, but they are actually different things. What is wx+b in 3D? If we imagine that there are blue dots in one corner of the room and yellow dots in the other corner, it is obviously not enough to use a straight line, a plane is needed! So in three dimensions, wx+b is a plane! As for why, it will be explained in detail later. What about four dimensions? emmm... It seems that there is no way to describe what can separate the four-dimensional space, but for the four-dimensional, there should be something that cuts the four-dimensional space in half like a knife. To be cut in half, it should be a plane for four dimensions, just like it is a two-dimensional plane for three dimensions, and a one-dimensional plane for two dimensions. In short, wx+b in four dimensions can be expressed as a flat object relative to four dimensions, and then divide the four-dimensional space into two. We call it a hyperplane. From this, in high-dimensional space, wx+b is a partition hyperplane, which is its official name.

Formally speaking:
wx+b is a hyperplane S in n-dimensional space, where w is the normal vector of the hyperplane, and b is the intercept of the hyperplane. This hyperplane divides the feature space into two parts, located in the two parts The points are divided into positive and negative categories respectively, so the hyperplane S is called a separating hyperplane.

detail:

w is the normal vector of the hyperplane: this is how w is defined for a plane, it is mathematical knowledge, you can google it

b is the intercept of the hyperplane: it can be understood as ax+b in two dimensions

Feature space: That is, the entire n-dimensional space. Each attribute of a sample is called a feature. The feature space means that all attribute combinations of the sample can be found in this space.

insert image description here
From the initial requirement of f(x), we extended it to sign(x) that can only output 1 and -1, and now wx+b, it seems more and more simple, as long as we can find the most suitable wx +b, you can complete the construction of the perceptron. As mentioned earlier, to find the hyperplane by maximizing the distance sum of misclassified points, first we need to release a formula for calculating the distance between a point and the hyperplane, so that the distance formula of all points can be calculated, right?

insert image description here

First look at wx+b. In two-dimensional space, we can think of it as a straight line. At the same time, because of the transformation, wx+b is the x-axis after the whole picture is rotated, so the distance from all points to the x-axis is actually wx+ Is the value of b correct? Of course, considering the points below the x-axis, you have to add the absolute value -> |wx+b| to find the sum of the distances of all misclassified points, that is, to find the sum of |wx+b| to minimize it. It's very simple, just reduce w and b in equal proportions. For example, w is changed to 0.5w, b is changed to 0.5b, the line is still the same line, but the value is reduced by twice! Are you still not satisfied? I can keep shrinking! Shrink to 0! So, we need to add some constraints, let the whole formula be divided by the modulus length of w. What do you mean? That is, no matter what, w must be divided by its unit length. If I shrink w and b proportionally, then ||w|| will also shrink proportionally, and the value remains motionless and very stable. Before dividing by the modulus length, |wx+b| is called the function interval, and after dividing the modulus length, it is called the geometric interval. The geometric interval can be regarded as the actual length in the physical sense. No matter how you zoom in or out, your physical distance is just like that, which is impossible Just change the number. When calculating distance in machine learning, geometric intervals are usually used, otherwise the solution cannot be obtained.

insert image description here
For misclassified data, such as the point that should actually belong to blue (the right side of the line, y>0), but it is actually predicted to be on the left side (wx+b<0), it is wrong, and the result is Negative, add a sign at this time, the result is positive, and then divide by the modulus length of w, it is an example of a single misclassified point to a hyperplane. For example, the sum is the sum of all misclassified points.

At the end of the picture above, it says that dividing by the modulus length is not considered, and it becomes a function interval. Why can this be done? Didn't you consider the scaling down of wb? Is what I said above wrong?

One explanation is this: the perceptron is an algorithm driven by misclassification, and its ultimate goal is to have no misclassified points. If there are no misclassified points, the sum distance becomes 0, and the w and b values ​​have nothing to do with it. use. Therefore, there is no difference between the geometric interval and the functional interval in the application of the perceptron. For the sake of simple calculation, the functional interval is used.

insert image description here
The above is the formal definition of the loss function. The ultimate goal of obtaining the division hyperplane is to minimize the loss function. If it is 0, it is quite perfect.
insert image description here

The perceptron uses the gradient descent method to obtain the optimal solution of w and b, so as to obtain the division hyperplane wx+b, and the gradient descent and its step size are limited by the space, you can Google it yourself.

3. Code implementation

#coding=utf-8
#Author:Dodo
#Date:2018-11-15
#Email:[email protected]
'''
数据集:Mnist
训练集数量:60000
测试集数量:10000
------------------------------
运行结果:
正确率:81.72%(二分类)
运行时长:78.6s
'''
import numpy as np
import time
def loadData(fileName):
    '''
    加载Mnist数据集
    :param fileName:要加载的数据集路径
    :return: list形式的数据集及标记
    '''
    print('start to read data')
    # 存放数据及标记的list
    dataArr = []; labelArr = []
    # 打开文件
    fr = open(fileName, 'r')
    # 将文件按行读取
    for line in fr.readlines():
        # 对每一行数据按切割福','进行切割,返回字段列表
        curLine = line.strip().split(',')
        # Mnsit有0-9是个标记,由于是二分类任务,所以将>=5的作为1,<5为-1
        if int(curLine[0]) >= 5:
            labelArr.append(1)
        else:
            labelArr.append(-1)
        #存放标记
        #[int(num) for num in curLine[1:]] -> 遍历每一行中除了以第一哥元素(标记)外将所有元素转换成int类型
        #[int(num)/255 for num in curLine[1:]] -> 将所有数据除255归一化(非必须步骤,可以不归一化)
        dataArr.append([int(num)/255 for num in curLine[1:]])
    #返回data和label
    return dataArr, labelArr
def perceptron(dataArr, labelArr, iter=50):
    '''
    感知器训练过程
    :param dataArr:训练集的数据 (list)
    :param labelArr: 训练集的标签(list)
    :param iter: 迭代次数,默认50
    :return: 训练好的w和b
    '''
    print('start to trans')
    #将数据转换成矩阵形式(在机器学习中因为通常都是向量的运算,转换称矩阵形式方便运算)
    #转换后的数据中每一个样本的向量都是横向的
    dataMat = np.mat(dataArr)
    #将标签转换成矩阵,之后转置(.T为转置)。
    #转置是因为在运算中需要单独取label中的某一个元素,如果是1xN的矩阵的话,无法用label[i]的方式读取
    #对于只有1xN的label可以不转换成矩阵,直接label[i]即可,这里转换是为了格式上的统一
    labelMat = np.mat(labelArr).T
    #获取数据矩阵的大小,为m*n
    m, n = np.shape(dataMat)
    #创建初始权重w,初始值全为0。
    #np.shape(dataMat)的返回值为m,n -> np.shape(dataMat)[1])的值即为n,与
    #样本长度保持一致
    w = np.zeros((1, np.shape(dataMat)[1]))
    #初始化偏置b为0
    b = 0
    #初始化步长,也就是梯度下降过程中的n,控制梯度下降速率
    h = 0.0001
    #进行iter次迭代计算
    for k in range(iter):
        #对于每一个样本进行梯度下降
        #李航书中在2.3.1开头部分使用的梯度下降,是全部样本都算一遍以后,统一
        #进行一次梯度下降
        #在2.3.1的后半部分可以看到(例如公式2.6 2.7),求和符号没有了,此时用
        #的是随机梯度下降,即计算一个样本就针对该样本进行一次梯度下降。
        #两者的差异各有千秋,但较为常用的是随机梯度下降。
        for i in range(m):
            #获取当前样本的向量
            xi = dataMat[i]
            #获取当前样本所对应的标签
            yi = labelMat[i]
            #判断是否是误分类样本
            #误分类样本特诊为: -yi(w*xi+b)>=0,详细可参考书中2.2.2小节
            #在书的公式中写的是>0,实际上如果=0,说明改点在超平面上,也是不正确的
            if -1 * yi * (w * xi.T + b) >= 0:
                #对于误分类样本,进行梯度下降,更新w和b
                w = w + h *  yi * xi
                b = b + h * yi
        #打印训练进度
        print('Round %d:%d training' % (k, iter))
    #返回训练完的w、b
    return w, b
def test(dataArr, labelArr, w, b):
    '''
    测试准确率
    :param dataArr:测试集
    :param labelArr: 测试集标签
    :param w: 训练获得的权重w
    :param b: 训练获得的偏置b
    :return: 正确率
    '''
    print('start to test')
    #将数据集转换为矩阵形式方便运算
    dataMat = np.mat(dataArr)
    #将label转换为矩阵并转置,详细信息参考上文perceptron中
    #对于这部分的解说
    labelMat = np.mat(labelArr).T
    #获取测试数据集矩阵的大小
    m, n = np.shape(dataMat)
    #错误样本数计数
    errorCnt = 0
    #遍历所有测试样本
    for i in range(m):
        #获得单个样本向量
        xi = dataMat[i]
        #获得该样本标记
        yi = labelMat[i]
        #获得运算结果
        result = -1 * yi * (w * xi.T + b)
        #如果-yi(w*xi+b)>=0,说明该样本被误分类,错误样本数加一
        if result >= 0: errorCnt += 1
    #正确率 = 1 - (样本分类错误数 / 样本总数)
    accruRate = 1 - (errorCnt / m)
    #返回正确率
    return accruRate
if __name__ == '__main__':
    #获取当前时间
    #在文末同样获取当前时间,两时间差即为程序运行时间
    start = time.time()
    #获取训练集及标签
    trainData, trainLabel = loadData('../Mnist/mnist_train.csv')
    #获取测试集及标签
    testData, testLabel = loadData('../Mnist/mnist_test.csv')
    #训练获得权重
    w, b = perceptron(trainData, trainLabel, iter = 30)
    #进行测试,获得正确率
    accruRate = test(testData, testLabel, w, b)
    #获取当前时间,作为结束时间
    end = time.time()
    #显示正确率
    print('accuracy rate is:', accruRate)
    #显示用时时长
    print('time span:', end - start)

Guess you like

Origin blog.csdn.net/math_assistant/article/details/132010345