Algorithm study notes - Perceptron principle and code implementation

Perceptron principle and code implementation

Part finished gradient descent, this blog we have to take a very important sort out the binary classification algorithm - Perceptron, which is a binary model, when a series of input data, the output is a binary classification variables, such as 0 or 1

1. Principle algorithm

1.1 Knowledge introduced

Speaking classification algorithm, another algorithm bloggers think of logistic regression, and Perceptron In principle return to class and the biggest difference is the introduction of a geometric algorithm idea, the vector into a high-dimensional space up imagination. Introduces a concept - hyperplane (hyper plane), a hyperplane is also less than the current set of dimensions of a number of dimensions. For example, if a two-dimensional plane, then the plane is a super-dimensional plane, which is a straight line, vividly imagine, is like cutting a piece of paper and a piece of paper labeled 1 and mark -1 point to separate the; analogy to the three-dimensional space, the hyper-plane is a two-dimensional plane to cut a cube, good, let's start mathematical derivation principle:

High school math, when we know that point in a straight line on a two-dimensional plane with general equation can be expressed, then the two-dimensional plane, meet the (x, y) represents the point in this general equation above, and meet point, said in this general point below the equation, this situation extended to the high-dimensional space, such as n-dimensional space (too high dimensions that we can not intuitively imagine), you can get a hyperplane equation , then the derivation shows that meet the left hand side> 0 point naturally in the upper hyperplane <0 point on the hyperplane below, naturally, in this way all the points into two categories, therefore, the original problem is transformed into found hyperplane such a problem will be divided into two classes of points, the two vector equations written as a constant term is multiplied by plus or intercept of said form, the following formula can be obtained:

Order , then be converted to the original formula: , so that now , and so , then F (x) function can be used for linear classifier.

Derivation 1.2

After the above formula to get, we would have to, how to obtain the value w and b it? Two routines on the blog bloggers imitation, it is naturally slightly in three steps, the first randomly sets the initial value w and b, then write a loss function (also called cost function), and finally by the continuous method using a gradient descent iterative find an optimum solution such that the minimum loss function. According to this idea, the first thought of the idea is easy, since our goal is to find a hyperplane equation is correct to the point of completely separate categories, so the best solution is to make the natural point of misclassification down to 0 on the line chanting, but this method has a significant drawback, if a function value of two, then, is the classification variables, function which is not a continuous image, the third step gradient descent we can not use! So the second idea came into being, since we have to let the continuous function, we seek to convert the number into a misclassification point to point distance and ultra plane equation is not on line yet? Below that, we first derive the distance formula points to the line:

 In the two-dimensional rectangular coordinate plane XOY, has a straight line equation , the current request is the vector mode i.e. the length d:

First, the derived formula d:

Let R be the point of coordinates , since point R on a straight line, it must be satisfied: may be introduced

Can be easily calculated by the equation slope of the original equation (-b, a), then the direction of the vector perpendicular to the principle can be calculated is 0 (a, b) is a dot product, this way, two formulas the key vectors have been calculated, now into the formula

, And then into the equation and finishing can be obtained , since the distance ratio is positive, with the absolute value symbol, the final equation is:

Thus, the derivation of the two-dimensional plane come to an end, which is now extended to a high-dimensional space is calculated misclassification hyperplane point distance, on the analogy equation, we obtain:

Misclassification set point set is M, L is a function of the loss formula:

Next seeking to loss of function of the guide above, in two steps, find the partial derivative of a function of w, and the partial derivative of a function of b, the gradient is obtained using iterations can be decreased, however, the absolute value function evaluation containing the guide is not easy, it is necessary to find ways to be removed in some way by the absolute value symbol,

 

 注意到,对于正确分类点来说,正类标记为1,负类标记为-1,而正类位于超平面上方,负类位于超平面下方,因此对于正确分类点,必有不等式,那既然所有的点只能区分为正确分类点以及误分类点,那么对于误分类点就必有不等式,就是一个负负得正的道理,这样一来,绝对值符号可以去除,原式就变为了:,我们的目标就是求这个函数的最小值即可。至此,已经可以给出两个偏导数的计算公式了:

下面开始简单写一下梯度下降的伪代码:

1. 选定一个系数向量和截距的初值

2.在训练集中进行遍历,逐个选取

3.遍历过程中,如果出现的点,那就说明该点是误分类点,这时就需要更新初值了,设步长为λ,则更新函数为:

4.重复2、3两步,直到整个数据集没有误分类点,即可说明线性分类器已经找到

2.代码编写

下面给出上述算法的python代码实现,并将代码封装成了一个类,这个类中定义了两个不同的梯度下降代码实现,分别为批量梯度下降以及随机梯度下降:

import numpy as np
import pandas as pd
import random
from sklearn.datasets import make_blobs

class perceptron:
    
    def __init__(self, sample_size = 200, centers = [[1,1],[3,4]], cluster_std = 0.6):
        X,Y = make_blobs(n_samples = sample_size, centers = centers, cluster_std = cluster_std, random_state = 11)
        Y[Y == 0] = -1
        self.X = X
        self.Y = Y
    
    def BGDfit(self, w, b, lam):
        #This function aims to use Batch Gradient Descent algorithm to find the most suitable function
        while True:
            x_mat = np.mat(self.X)
            tmp = np.ravel(x_mat * w + b) * self.Y
            if sum(tmp < 0) == 0:
                break
            tmp_X = np.array(self.X[tmp < 0])
            tmp_Y = np.array(self.Y[tmp < 0])
            tmp_Y_final = np.column_stack((tmp_Y, tmp_Y))
            w = w + lam * np.mat((tmp_X * tmp_Y_final).sum(axis = 0)).T
            b = b + lam * sum(tmp_Y)
        return w, b
    
    def SGDfit(self, w, b, lam):
        #This function aims to use Stochastic Gradient Descent algorithm to find the most suitable function
        missing_index = 0
        while missing_index != -1:
            missing_index = -1
            for i in range(self.X.shape[0]):
                if self.Y[i] * (np.dot(w, self.X[i]) + b) < 0:
                    missing_index = i
                    break
            if missing_index != -1:
                w = w + lam * self.Y[missing_index] * self.X[missing_index]
                b = b + lam * self.Y[missing_index]
        return w, b

3.感知机的对偶形式

3.1 原理

使用原始的感知机形式时,在遇到处理海量数据的场景时,运算速度会十分缓慢,那么,存不存在一种方式,我们可以预先将某些矩阵计算好,将其缓存起来而不必每次都进行重复的运算呢?答案是肯定的,这就是感知机的对偶形式,下面开始正式介绍:在1.2章节的时候给出过w和b的更新函数,每遇到一个误分类点,就将w进行更新:,将b进行更新:,我们可以从中总结出这样一个规律,那就是,每遇到一个误分类点一次,就在w基础上多累加一次,在b基础上多累加一次,在式子中,λ是确定的,特征值数据集是确定的,特征值对应的分类也是确定的,唯一不确定的就是我们不知道何时会遇到误分类的点,因此,不确定的变量是误分类次数,那么,感知机的对偶形式呼之欲出,我们只要维护一个误分类次数的数组,每当遇到误分类点的时候,在该数组的对应位置上进行累加即可!

设数据集中一共有n个点,并设误分类点次数向量为,根据上面的思路,w和b就可以写成,,我们将w的初始值设置成零向量,那么w就变成了:,将这个式子代入至判断某个点是否是误分类点的判别式中,如果是误分类点,那就更新a和b,具体的伪代码如下:

1.选定a和b的初始值:,b = 0

2.在训练集中进行遍历,逐个选取

3.如果发现某个点,那就更新a和b,

4.重复2、3步直到没有误分类点,算法停止

3.2 代码实现

下面给出感知机对偶形式的python代码实现,使用的是随机梯度下降法:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import random
from sklearn.datasets import make_blobs

sample_size = 200
centers = [[1,1], [3,4]]
X, Y = make_blobs(n_samples = sample_size, centers = centers, cluster_std = 0.6, random_state = 11)
Y[Y == 0] = -1
gram = np.array(np.mat(X) * np.mat(X).T)
a = np.zeros(X.shape[0])
b = 0
lam = 0.1
count = 0
while True:
    count += 1
    missing_index = -1
    for i in range(X.shape[0]):
        checking = Y[i] * ((a * Y * gram[i, :]).sum() + b)
        if checking <=0 :
            missing_index = i
            break
    if missing_index == -1:
        break
    a[missing_index] += lam
    b += lam * Y[missing_index]
theta = np.ravel(np.mat(a * Y) * np.mat(X))
theta, b

 

Guess you like

Origin www.cnblogs.com/w950219/p/12337121.html