About Some knowledge of SVM

SVM support vector machine

Definitions : support vector machine learning model is mainly used to solve classification problems. The basic model is maximized to find the interval separating hyperplane linear classifier in the feature space.

classification

1- When the training sample linearly separable, hard by maximizing the interval, learn a linear classifier, called linearly separable support vector machine

2- when the training data is approximately linearly separable, slack variables introduced, by maximizing the soft spacer, a linear classifier learning called SVM linear

3- When training data is linearly inseparable, by maximizing the use of nuclear techniques and soft interval, non-linear support vector machine learning

SVM The main advantage of this algorithm are :

     1) address the high dimensional feature of classification and regression problems is very effective when feature dimensions greater than the number of samples still have a good effect.

     2) using only a portion of the support vector do hyperplane decisions without relying on all the data.

     3) a large number of kernel function can be used, which can be very flexible to solve nonlinear classification and regression problems.

     4) sample size is not huge amounts of data when high classification accuracy and strong generalization ability.

SVM main drawback of the algorithm are :

     1) If the feature dimensions much larger than the number of samples, the SVM performance in general.

     2) SVM sample volume is very large, very high dimensional kernel function mapping, the calculation is too large, not suitable for use.

     3)  the choice of kernel nonlinear problem no universal standard, it is difficult to select an appropriate kernel function.

     4) SVM sensitive to missing data.

SVM meaning Kernel

SVM kernel function maps the input space to a high dimensional feature space, the final optimal separating hyperplane is constructed in the high dimensional feature space, thereby to separate the data can not be linearly separable plane itself. The true meaning of kernels that do not really mapped into high-dimensional space has reached role mapping, mapping reduces the amount of calculation.

How to choose the type and function of nuclear

1- If the number of features to the large number of samples and similar, using a linear kernel SVM or the LR (logistic regression)

2- If a small number of features, the normal number of samples, using the Gaussian kernel SVM

3- If a small number of features, a lot number of samples, since the time to solve the optimization problem, the objective function relates to the product within two two sample calculations, calculated using the Gaussian kernel will be significantly greater than the linear kernel, so manually add some features, such linearly separable , can then be used LR or a linear kernel SVM

4- use of cross-validation try different kernel functions, minimum error is the best kernel function

5- kernel mixing method, the kernel function to combine different

Why do you want to solve the SVM original problem into a dual problem

Dual problem can be given a lower bound of the original problem, when certain conditions are met, and original solution to the problem is equivalent to the dual problem, and it can naturally kernel functions

SVM how to maximize the use of interval

When training data is linearly separable, there are an infinite number separating hyperplane may be two separate data correctly. Perceptron neural network or the like using a minimum misclassification policy, separating hyperplane is obtained, but at this time there are an infinite number of solutions. SVM using linearly separable find an optimum spacing to maximize the separating hyperplane, then the only explanation. On the other hand, at this time separating hyperplane generated classification result is the most robust, the generalization ability of the strongest known examples.

Why SVM sensitive to noise and missing values

When too much noise is present, or if and when the noise appears to be when SVM, the noise model implications are enormous

Missing values herein refers to certain features missing data, vector data is incomplete. Because SVM unlike the decision tree as strategically deal with missing values, so if there are missing values, the data in the feature dimension is difficult to correct classification, it will affect the quality of training result.

SVM approach the sample skew

Sample skew refers to the number of samples based unevenness negative real dataset, for example, n-type samples have 10000 th negative samples only class 100 th, so that the hyperplane will be pushed a negative type, because less negative class number, distribution the broad enough, it will affect the accuracy of the results.

Solution : We can use the sample being kind of penalty factor C + , with a negative sample to another class C- , measure how they are distributed in a good way when determining this ratio, the specific approach is to find a high-dimensional negative class space hypersphere, which contains all the negative class samples, give a positive class find, compare the radii of the two spheres, wide distribution of large radius, punish small factor.

In practice be selected simply on the line, e.g. libSVM can be used directly than the number of samples. Such as exemplified above can be defined in   c +: c- = 1: 100

Guess you like

Origin www.cnblogs.com/1994tj/p/11808281.html