SVM Support Vector Machine
Cost function and principle
As a two-class classification model, the difference from logistic regression:
-
Hyperplane and interval:
From the point-to-plane distance formula and the related knowledge of the plane formula, it can be known that the positive, negative and absolute value of wx+b can indicate the classification prediction result and the relative prediction confidence.
SVM takes -1 and +1 for classification: it is mainly to facilitate calculation and fit the hyperplane classification. (As long as there are two differences, they are equivalent to classification)
(Here, let the function interval be 1, which is convenient for calculation.)
~Only the support vector works when determining the classification hyperplane. Moving outside the interval boundary or removing other points of the instance will not change the solution.
-
How to solve the above constrained optimization problem?
Answer: Use Lagrangian multiplier method, Lagrangian duality and KKT condition.
(I don't need to expand here... Xiaobai's summary is not good) -
Add soft interval (using slack variables, instead of 0/1 loss function):
(1. Hinge loss function is used here; 2. Pay attention to not just, similar to "safe distance")
Image understanding
Large spacing classifier: Robust.
Large interval classifier
(Can be regarded as the projection of x(i) on θ)
Kernel function
In addition to trying to use nonlinear equations to establish complex decision-making boundaries, the kernel function is also a feasible method.
-
Gaussian kernel:
σ is the parameter of the Gaussian kernel function.
example:
-
Other kernel functions:
Polynomial kernel functions
and string sum functions, chi-square kernel functions, histogram intersection kernel functions, etc.
-
Select marker points and other questions
Sample method example:
After that is the learning and training using the kernel function:
parameter problem:
applied to two commonly used kernel functions: (linear kernel function: function without kernel parameters)
Note that the data should be normalized before using the Gaussian kernel function!
Multi-category
Model selection
Logistic regression and linear kernel function: if there are more features, you can consider the latter
The third case: using Gaussian kernel computing speed will be slow
A good software package can generally obtain the global optimal solution (convex optimization) without worrying about the local optimal problem.