"Machine Learning in Practice" Chapter 6 SVM Learning Summary

SVM: Support Vector Machine, is a binary classification classifier that maximizes the classification interval by solving a quadratic optimization problem. Both linear and nonlinear classification are supported. Linear classification does not require sample data, and nonlinear classification requires partial sample data (support vectors).

                                            

Separation Hyperplane: The line that separates the dataset.

Support vectors: those points closest to the separating hyperplane.

The core idea of ​​SVM: Similar to logistic regression, a set of weight coefficients are obtained from a set of training sets, and the test set is classified after linear representation. That is to say, first train a separating hyperplane as the decision boundary for classification, and the data on both sides of the plane belong to different two categories. What we need to do is find those points (support vectors) that are closest to the separating hyperplane, and then maximize the distance of the support vectors to the separating surface . SVM aims to find the optimal separating hyperplane.

The separating hyperplane can be expressed as:

                                    

All we need to do is find out w and b. Suppose that for the training dataset T, its data can be divided into two categories C1 and C2.
For functions: f(x)=xwT+b
for data f(x)⩾1 of class C1. There is at least one point xi, f(xi)=1. This point is called the closest point.
For data f(x)⩽−1 of class C2. There is at least one point xi, f(xi)=−1. This point is also called the closest point.

Solve using Lagrange multipliers and KKT conditions. The optimization problem becomes:

                                

Therefore, the problem of solving for w is transformed into solving for the coefficients alpha.

       In the book Machine Learning in Action, use the SMO (Sequence Minimal Optimization) algorithm to solve. The goal of the SMO algorithm is to find a series of alpha and b, and its working principle is: two alphas are selected for optimization in each loop. Once a bunch of suitable alphas are found, increase one while decreasing the other. The criteria for evaluating whether two alphas are "appropriate" are that the two values ​​must be outside the boundary of the interval, and the two alphas have not been intervalized or not on the boundary.

       In practical applications, if the data is linearly inseparable, the concept of a kernel function will be introduced. The kernel function can map the low-dimensional feature space to the high-dimensional feature space, and the data is linearly separable on this feature space. The radial basis function is a commonly used kernel function in SVM. It uses a vector as an independent variable and outputs a scalar based on the vector distance operation.



Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325792543&siteId=291194637