SVM Introduction
SVM (support vector machines, SVM) is a binary model which basic model is defined in the feature space maximum spaced linear classifiers, perceptron distinguish it from the maximum interval; further comprising the SVM kernel trick , making it essentially a linear classifier. SVM's learning strategy is to maximize the interval, can be formalized as a problem solving convex quadratic programming, it is also equivalent to a minimization problem regularization of the hinge loss function. The SVM learning algorithm is an optimization algorithm for solving convex quadratic programming.
SVM algorithm principle
The basic idea of SVM is to solve able to correctly divide the training data set and geometric maximum interval separating hyperplane. As shown below, is the separating hyperplane for the linearly separable data sets, so that there are infinitely many hyperplanes (i.e. perceptron), but the geometric maximum interval separating hyperplane is unique.
Before derivation, to some definitions. Assuming that a given training data set on a feature space
Wherein, , , for the first feature vector a class label, when it is positive cases is equal to +1; -1 when negative example. Further assume that the training data set is linearly separable.
Geometry of intervals: for a given set of data and the hyperplane , a hyperplane defined sample points on the geometric interval is
Hyperplane minimum geometric spacing on all sample points of
In fact, this is the distance from the hyperplane to our so-called support vector.
Based on the above definition, solving the biggest issue dividing hyperplane SVM model can be expressed as the following constrained optimization problem
The constraints on both sides while divided , to give
Because all scalar, so in order expression of brevity, so
get
And because maximization is equivalent to the maximization , it is equivalent to minimizing ( to the back after derivation concise form, does not affect the result), so SVM model to solve the biggest problem has split hyperplane can be expressed as the following constraints most Optimization
This is a convex quadratic programming problem contains inequality constraints, we can get it (dual problem) the dual problem of its use Lagrange multipliers.
First of all, we will have the constraints of the original objective function converted to the new structure unconstrained Lagrangian objective function
Which is the Lagrange multiplier, and . Now we make
When the sample point does not satisfy the constraint conditions, i.e. outside the feasible solution area:
In this case, set to infinity, it is infinite.
When this point is full constraints are met, i.e. in the feasible region:
In this case, the original function itself. So, the two cases can be combined to get our new objective function
So the problem is equivalent to the original constraint
Look at our new objective function, to seek the maximum value, and then seek a minimum. In this case, we first have to face with the need to solve the parameters and equations, which is inequality constraints, this solution process is not good to do. So, we need to use the Lagrangian duality, will exchange about the minimum and maximum position, thus becomes:
Have a need to meet two conditions:
① optimization problem is convex optimization problem
② meet the KKT conditions
First, the optimization problem is a convex optimization problem is clearly, therefore satisfies a condition, and to meet the two conditions, i.e. the requirements
In order to obtain a specific form to solve the dual problem, so on and the deflector is 0, available
The above two equations into the Lagrangian objective function, elimination and to give
which is
Seeking to great, that is, the dual problem
The goal of formula a minus sign, will be converted to solving a very small APPROACHING
Our optimization problem now becomes the above form. For this problem, we have a more efficient optimization algorithm, that sequence minimal optimization (SMO) algorithm. Here temporarily deployed more details on using SMO algorithm for solving optimization problems, coupled with the next article detailed derivation.
We can get through this optimization algorithm , and then based on that we can solve for and thus achieve our original purpose: to find the hyperplane that "decision plane."
Derivation assumed to satisfy the foregoing are established under the KKT conditions, KKT conditions are as follows
Further, according to the previous derivation, there are established the following two formulas
It can be seen in , at least there is a (reductio ad absurdum to prove, if all zeros, the contradiction), which has
So you can get
For any training sample , there is always or . If , then the sample does not appear in the final formula to solve the model parameters. If , certainly has , corresponding to the maximum sample point located on the interval boundaries, it is a support vector. This shows an important property of SVM: After training, most of the training samples do not need to retain the final model only supports vector-related.
Here the training data are based on the assumption for linearly separable, but the data is linearly separable almost complete absence of actual case, in order to solve this problem, the concept of "soft spacer", i.e. allow certain point does not satisfy the constraint
Using hinge loss, the original optimization problem is rewritten as
Wherein the "slack variables" , i.e. a hinge loss function. Each sample has a corresponding slack variable characterizing the degree of the sample does not satisfy the constraint. It called the penalty parameter, the larger the value, the greater the punishment for classification. Consistent with the idea of solving linear separability, also here to get Lagrangian with Lagrange multipliers, and then seek its dual problem.
Based on the above discussion, we can get a linear support vector machine learning algorithm is as follows:
Input: training data set wherein , ;
Output: separating hyperplane decision function and classification
(1) Select penalty parameter , construction and solving convex quadratic programming problems
The optimal solution
(2) Calculation
Selecting the one component satisfies the condition is calculated
(3) separating hyperplane required
Classification decision function:
Nonlinear SVM algorithm principle
For nonlinear classification problems input space, it can be non-linear transformation into a linear dimension of a classification feature space, linear support vector machine learning in the high dimensional feature space. Because of the dual problem of linear support vector machine learning, the objective function and classification decision function only involves the inner product between instances and examples, it is not necessary to explicitly specify the nonlinear transformation, but with replacement within the kernel function among product. Kernel Function, the inner product between two instances after passing through a nonlinear conversion. Specifically, a function, or a positive definite nucleus, means that there is a mapping from the input space to the feature space , an arbitrary input space , there
Dual problem in linear support vector machine learning, with a core function within an alternative product, get solved is non-linear support vector machine
Based on the above discussion, we can get non-linear support vector machine learning algorithm is as follows:
Input: training data set wherein , ;
Output: separating hyperplane decision function and classification
(1) selecting a suitable kernel function and penalty parameter , configuration and Convex Quadratic Programming
The optimal solution
(2) Calculation
Selecting the one component satisfies the condition is calculated
(3) classification decision function:
Introduce a common core function - Gaussian kernel
Corresponding to the radial basis function is a Gaussian SVM classifier, in this case, the classification decision function
reference
[1] "statistical learning methods" Lee Hang
[2] "machine learning" Zhou Zhihua
[3] A Python 3 "machine learning real" study notes (VIII): Shredded linear support vector machine principle articles of SVM Jack-Cui
[4] in-depth understanding of Lagrange multiplier method (Lagrange Multiplier) and KKT conditions
[5] Support Vector Machine Introduction popular (SVM appreciated that the three-state)