Introduction to the principle of support vector machine algorithm

1, Introduction to Support Vector Machine concept

Classification as a field of data mining a very important task, its purpose is to learn a classification function or classification model (or called classifiers), and support vector machine itself is a method of supervised learning, which is widely used statistical classification and regression analysis.

SVM (Support Vector Machine, SVM) is a mid-1990s developed a machine learning method based on statistical learning theory, by seeking structural risk minimization to improve learning ability generalization, empirical risk and confidence to achieve the minimum range technology, so as to achieve the purpose of statistics in the case of small sample size, can also get a good statistical laws. SVM (support vector machine) simply, it is a classifier, and is the second-class classifier. Vector: Popular, that is the point, or data. Machine: that is, classifier, which is classifier.

SVM problem to be addressed:
(1) find the decision boundaries and decide the best decision border
Here Insert Picture Description
(2) classification (kernel transform) for feature data is difficult to classify
(3) computational complexity

2, support vector machine algorithm deduced

2.1 decision boundary

Elected furthest away from the minefield boundary line (that is, points on the boundary minefield to Large Margin), the greater the range, the better classification results, the stronger generalization ability of the algorithm.
Here Insert Picture Description
2.2 calculate the distance

Here Insert Picture Description
It represents the boundary plane, w is a plane normal vector. Calculation point to the distance from the plane x, x 'and x' 'is a point on a plane, perpendicular to the length can be converted into xx' projected in the direction of the normal vector, and then eliminate x '.

2.3 Data tag defines

Data sets: (X1, Y1) (X2, Y2) ... (Xn, Yn); X is a sample, Y is the label, n is the number of samples (note: Y is a category of samples: when X is a positive example, Y> 0 when Y = +1 embodiment when X is negative, Y <0 when Y = -1)

Decision equation:
Here Insert Picture Description
Here Insert Picture Description

2.4 optimization goals

The distance from the point to the line of profile obtained (absolute value to the absolute value of y is 1):
Here Insert Picture Description

Optimization goals: to find a line (w and b), such that the closest (min) of the farthest point of the line is possible (max):
Here Insert Picture Description

放缩变换:对于决策方程(w,b)可以通过放缩(等比例放大或缩小)使得其结果值|Y|>= 1:
Here Insert Picture Description
之前我们认为恒大于0,现在严格了些。在上述约束条件下,现在目标函数只需要考虑下式:
Here Insert Picture Description
将求解极大值问题转换成极小值问题(引入1/2,以方便求导计算):
Here Insert Picture Description
并且w需满足以下约束条件:

Here Insert Picture Description
2.5 目标函数的求解

2.5.1 拉格朗日乘子法

对于带约束的优化问题有:
Here Insert Picture Description
原式转换:
Here Insert Picture Description
将目标函数代入得:
Here Insert Picture Description
2.6 SVM求解

分别对w和b求偏导,分别得到两个条件(由于对偶性质,KKT):

Here Insert Picture Description
对w求偏导:
Here Insert Picture Description

对b求偏导:
Here Insert Picture Description

带入原式:
Here Insert Picture Description
Here Insert Picture Description继续对ɑ求极大值:
Here Insert Picture Description
极大值转换成求极小值:
Here Insert Picture Description
其中约束条件为:
Here Insert Picture Description
3,SVM求解实例

数据:3个点,其中正例 X1(3,3) ,X2(4,3) ,负例X3(1,1)

求解:
Here Insert Picture Description
约束条件:
Here Insert Picture Description
将数据代入上式:
Here Insert Picture Description由于:
Here Insert Picture Description
化简可得:
Here Insert Picture Description
分别对ɑ1和ɑ2求偏导,令偏导等于0可得:
Here Insert Picture Description
ɑ2并不满足约束条件(ɑi>=0),所以解应在边界上(令ɑ1=0或ɑ2=0)。
Here Insert Picture Description将ɑ结果带入求解:
Here Insert Picture Description
Here Insert Picture Description平面方程为:
Here Insert Picture Description
在图像显示为:
Here Insert Picture Description
当ɑi=0时,样本点xi对决策边界的构成无影响。即边界上的样本点(ɑi不为0,支持向量)构成了最终结果。

支持向量:真正发挥作用的数据点,ɑ值不为0的点(只要支持向量不变,样本点的数量多少不会影响最终结果)。
Here Insert Picture Description
4,,软间隔

软间隔(soft-margin):有时候数据中有一些噪音点,如果考虑它们得到的决策边界就不太好了。之前的方法要求要把两类点完全分得开,这个要求有点过于严格了,我们引入松弛因子解决上述问题。
Here Insert Picture Description
松弛因子的表达式:

Here Insert Picture Description

新的目标函数:
Here Insert Picture Description
其中,C是我们需要指定的一个参数。当C趋近于很大(需要很小的松弛因子)时:意味着分类严格不能有错误;当C趋近于很小时:意味着可以有更大的错误容忍。

软间隔算法的求解:
Here Insert Picture Description5,低维不可分问题

5.1 核变换:既然低维的时候不可分,可找到一种变换的方法将它映射到高维。
Here Insert Picture Description5.2 低维不可分问题实例

假设有两个数据x=(x1,x2,x3);y=(y1,y2,y3),此时在三维空间很难对其线性划分。通过对特征进行一系列的组合操作将原始数据映射到九维空间,f(x)=(x1x1,x1x2,x1x3,x2x1,x2x2,x2x3,x3x1,x3x2,x3x3),由于需要计算内积,所以新的数据在九维空间,需要计算<f(x),f(y)>的内积,需要花费(n^2)。
例如令x=(1,2,3),y=(4,5,6),则f(x)=(1,2,3,2,4,6,3,6,9),f(y)=(16,20,24,20,25,36,24,30,36)故<f(x),f(y)>=16+40+72+40+100+180+72+180+324=1024。如果将维数扩大到很大的一个数,计算量将会很大。但是可以发现:K(x,y)=(4+10+18)^ 2 =1024 。 即K(x,y)= (<x,y>)^2=<f(x),f(y)>

The advantage of using kernel functions that can to complete computes the product of a high dimensional space samples in the low-dimensional space (and not mapped to the raw data is basically just assumed to be such that the actual or do the calculation (the first in the low dimensional space remapping product), and the results of the same calculation result corresponding to a high-dimensional space is only calculated)

5.3 Gaussian kernel (nearly infinite dimensional transformation)

Here Insert Picture Description

Linear support vector machine (linear kernel) and nonlinear SVM (Gaussian kernel):
Here Insert Picture Description

Guess you like

Origin blog.csdn.net/qq_43660987/article/details/91450490
Recommended