Machine Learning- support vector machine (SVM)

       SVM is a new technology in data mining, is a new tool by means of optimization of machine learning methods to solve the problem, solve it in a small sample, nonlinear and high dimensional pattern recognition exhibit many advantages, and can promote application to function fitting and other machine learning problems. SVM is a supervised learning mode by class data binary classification of generalized linear classifiers, which object is solved by the learning samples hyperplane maximum interval. Given the training sample set

                                        

      Classification learning basic idea is based on the training set D division to find a hyperplane in the sample space to separate different categories. But the training sample can separate division hyperplane may be many, what should we look for it?

figure 1

     1, Intuitively, located should go to the middle of the two samples divided hyperplane, a hyperplane because the division of the training samples localized disturbance "tolerance" of the best.

     In the sample space, so that represents divided hyperplane, which is the normal vector, determines the hyperplane direction; B is a displacement term, determines the distance between the origin and the hyperplane. Such hyperplane can be normal vector w and the displacement term is determined b, which is referred to . Sample space of any point x to the hyperplane distance can be written as

Assume hyperplane capable of correctly classified training samples, that is, for , if , there ; if , there . make

      As shown in FIG . 2, from the hyperplane nearest training samples several points to (3) with equality, they are called "support vector", two heterogeneous support vector to the hyperplane is the sum of the distances , it is known as the "gap." To find the dividing hyperplane having "maximum interval", i.e. to find satisfying (3) where the constraint parameter w, b, such that the maximum, i.e.   st   clear that in order to maximize the spacing, only maximized , This is equivalent to minimized , thus, the equation can be rewritten as   st   which is basic support vector machine. By solving ( to obtain the maximum interval into the model corresponding hyperplane 5) of formula wherein w, b are model parameters.

figure 2

      Above all we assume that we are talking about training samples are linearly separable, that there is a hyperplane can divide the training samples correctly classified, but there may not be such a super-plane reality. As shown in FIG 3 , as for such a problem, the original space can be mapped to a higher dimensional feature space, in this feature space so that the sample can be divided into linear (FIG. 4 ).  

                                                      
      
image 3

Figure 4  

                                                                              

      令表示将x映射后的特征向量,于是,在特征空间中划分超平面所对应的模型可表示为,类似(5)式有  s.t.  ,其对偶问题是(7)求解式7)涉及到计算,这是样本xi与xj映射到特征空间之后的内积。由于特征空间维数可能很高,甚至可能是无穷维,因此直接计算通常是困难的。为了避开这个障碍,可以设想这样一个函数:  (8)即xi与xj在特征空间的内积等于它们在原始样本空间中通过函数计算的结果。有了这样的函数,我们就不必直接去计算高维甚至无穷维特征空间中的内积,也就是说,在线性不可分的情况下,支持向量机首先在低维空间中完成计算,然后通过核函数将输入空间映射到高维特征空间,最终在高维特征空间中构造出最优分离超平面,从而把平面上本身不好分的非线性数据分开,于是(7)式可重写为(9)求解后即可得到(10)这里的函数就是“核函数”。式(10)显示出模型最优解可通过训练样本的核函数展开,这一展式亦称“支持向量展式”。

      显然,若已知合适映射的具体形式,则可写出核函数,但在现实任务中我们通常不知道是什么形式,那么,合适的核函数是否一定存在的?什么样的函数能做核函数呢?我们有下面的定理:

 

       下表是几种常用的核函数。

表1 常用核函数

      在上面的讨论中,我们一直假定训练样本在样本空间或特征空间中是线性可分的,即存在一个超平面能将不同类的样本完全划分开。 然而,在现实任务中往往很难确定合适的核函数使得训练样本在特征空间中线性可分;退一步说,即使恰好找到了某个核函数使训练集在特征空间中线性可分,也很难断定这个貌似线性可分的结果不是由于过拟合所造成的。

      缓解该问题的一个办法是允许支持向量机在一些样本上出错。为此,就引入了“软间隔”的概念,如图5所示

图5  软间隔示意图.红色圈出了一些不满足约束的样本.

      具体来说,前面介绍的支持向量机形式是要求所有样本均满足约束(3),即所有样本都必须划分正确,这称为“硬间隔”,而软间隔则是允许某些样本不满足约束:(11)当然,在最大间隔化的同时,不满足约束的样本应尽可能少。于是,优化目标写为:(12)其中C>0是一个常数,是“0/1损失函数”:(13)显然,当C为无穷大时,式(12)迫使所有样本均满足约束(11),于是式(12)等价于(5);当C取有限值时,式(12)允许一些样本不满足约束。

对偶问题

      支持向量机有三宝:间隔、对偶、核技巧(核函数)。对于支持向量机我们想要找到一个划分超平面将样本正确分类,那么要找到具有最大间隔的划分超平面,就是要找到满足(3)式中约束的w,b使得最大,即  s.t.    i = 1,2,...,m,为了后面计算的方便,我们将它等价地写为 s.t.   i = 1,2,..,m,这是原始带约束的问题,求解这个式子计算比较复杂,所以我们用它的对偶问题来解决,具体来说就是对上式的每条约束添加拉格朗日乘子,则该问题的拉格朗日函数可写为:,这样引入拉格朗日乘子的无约束问题为:,它的对偶问题为:,我们分别对w和b求偏导,,代入得:

,代入得:

因此最后得对偶问题为:,我们求出和b后可得到模型:这里面有一个KKT条件,上述对偶问题需要满足此条件才可以互相转化:,于是,对任意训练样本,总有。若,则该样本将不会在模型(上式)的求和中出现,也就不会对有任何影响;若,则必有,所对应的样本点位于最大间隔边界上,是一个支持向量。这显示出支持向量机的一个重要性质:训练完成后,大部分的训练样本都不需要保留,最终模型仅与支持向量有关。

SVM应用案例

      表2R软件中支持向量机建模所用的软件包及数据集。

表2

      下图是用SVM进行分类的过程

      获取R软件中自带iris数据后,我们能够看到它共包含150个样本以及4个样本特征,结果标签共有三个类别均有50个样本。

      然后进行建模,X为特征变量,Y为结果变量建立SVM模型,然后是结果分析,分为三类。最后是预测,从150个样本中随机挑选8个预测结果进行展示,然后查看预测精度,发现模型将所有属于setosa类型的花全部预测正确;将属于versicolor类型的花中的48朵预测正确,但将另外两朵预测为virginica类型;同理,模型将属于 virdinica类型的花中的48朵预测正确,但也将另外两朵预测为versicolor类型。之后我们可以调整权重来优化建模。

 

 

 

 

 

 

Guess you like

Origin www.cnblogs.com/lf6688/p/11260024.html