Support Vector Machine (II) - and kernel linearly inseparable

First, the table of contents

1. List

2. Background

3, the kernel function is introduced

4, kernel function introduction

5、

   

Second, background

  Support vector machine (a) soft maximize mentioned interval can only solve linear caused due to abnormal points inseparable issues, and data sets for itself is non-linear problem can not do anything, according to the theory of linear low-dimensional space can not be separated the problem, the general map it to the high-dimensional space are linearly separable, we can apply this theory to support vector machine.

Third, the introduction of kernel function

   

Looking back at our previous optimization function:

Only need to optimize the inner product xi · xj converted to φ (xi) · φ (xj) can solve our problem of nonlinear function, but at the same time also introduced the new problem of our data dimension increases, calculate the amount of product also increases when high dimension mapped, even after reaching the infinite dimension, calculate the amount of time to solve the model will be significantly increased,

So how to deal with this problem? This requires the introduction of our core functions.

Fourth, the kernel function introduction

We know that even after the high-dimensional maps to the value of the inner product φ (xi) · φ (xj) also remains a constant, then the existence of such a function

K (xi · xj) = φ (xi) · φ (xj), there are theories to prove when such a function exists (Mercer theorems are proved), we will call the kernel function. The third lead-out highlights of the support vector machine: the sample need not be mapped to a high-dimensional space, and the kernel function for nonlinear classification problems .

By means of the kernel function to solve our problems, of course, is not able to function as a kernel function, has proven kernel is not much, but common kernel so few (scikit-learn common):

4.1 linear kernel

Linear kernel (Linear Kernel) is actually our first two linear separable SVM, the expression is: K (x, z) = x ∙ z

In other words, we can linearly separable SVM and SVM linearly inseparable classified as a class, only difference is linearly separable SVM with a linear kernel.

4.2 polynomial kernel function

Polynomial kernel (Polynomial Kernel) is linearly inseparable, one common kernel SVM, the expression:

Wherein a, c, d is the value we need to go through a set of parameter adjustment.

4.3 kernel function

Kernel function also called Gaussian kernel

Few parameters, general parameter σ is set

3.4 Sigmoid kernel function

To set the debug parameter a, r, wherein the hyperbolic tangent function tanh function, are also commonly used as the activation function in the neural network.

   

According to Taylor expansion, we know that higher-order derivative function which can be represented by a polynomial function, where the kernel function and Sigmoid kernel functions are higher-order can lead, so can be used to express the polynomial. In other words, the radial basis function kernel function and can all be expressed Sigmoid order polynomial, and therefore when the kernel function, and radial basis function kernel function Sigmoid usually better than the performance of the polynomial kernel function, as can be automatically to match the order , without the need for us to specify the order of the polynomial kernel function, polynomial kernel function in addition need to debug parameters will be more, so usually choose radial basis function, svc () function is the default .

Five, SVN Classification Summary

After the introduction of the kernel function, our SVM algorithm is considered relatively complete. It is no longer distinguish linearly separable. M is input samples (x1, y1), (x2 , y2), ..., (xm, ym) ,, wherein x is n-dimensional feature vector. y is a binary output value of 1 or -1. Separating hyperplane is output parameters W * and B * and the classification decision function.

Algorithm is as follows:

   

In general, before integration algorithms and neural networks popular, SVM classification algorithm is basically the best, but even today, it still occupies a higher position.

The main advantage of SVM are:

  1) the maximum spacing is introduced, high precision classification

  2) When the small sample size, but also accurate classification, and has good generalization ability

  3) introducing the kernel function, can easily solve nonlinear problems, to avoid the complexity of high-dimensional space, the direct use of the function of this inner product space (both kernel function), then by solving method in the case of direct linear separable decision problem solving high-dimensional space corresponding

  4) can solve the classification of high-dimensional feature, regression, even if the characteristic dimensions greater than the data sample, but also have a good performance

The main disadvantage of SVM are:

  1) When the sample volume is very large, the kernel function computing the inner product, solving the Lagrange multiplier α calculated values ​​are related and the number of samples, results in the calculation model at solving excessive

  2) the choice of kernel usually no clear guidance, and sometimes difficult to choose an appropriate kernel function, polynomial kernel function and the like, need to debug parameters are also very much

  3) SVM sensitive to missing data (like many algorithms are sensitive to missing values ​​for missing values, or deal in the feature works, or use tree model). 

The main application and research of support vector machines :

SVM is currently mainly used in the field of pattern recognition text recognition, Chinese classification, face recognition and so on; also be applied to many aspects of engineering technology and information filtering.

The main research focus is the optimization of support vector machine algorithm, including addressing the SVM quadratic programming problems, to solve the problem of large-scale SVM, SVM in solving the problem of QP problems, etc. The other is how to better structure SVM-based multi-class classifier, how to improve the ability of induction and classification speed of SVM, etc. how practical problems based on kernel function to determine is also an important research focus.

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

   

Guess you like

Origin www.cnblogs.com/yifanrensheng/p/11871544.html