Machine learning of slag books---SVM

Foreword:
This is an essay that records the machine learning process of slagben learning.

Body:
Support Vector Machines (SVM) are a set of supervised learning methods for classification, regression, and outlier detection.
In the classification problem, SVM is to find a decision boundary that is as far as possible from each category at the same time, that is, to maximize the margin (margin is the distance between the 2 dotted lines in the figure). This idea of ​​going as far as possible can improve the generalization ability of the model.


The dots on the dotted line are the support vectors, and the solid line is the decision boundary. This graph is a linearly separable case.

Finding the maximum value of margin is equivalent to finding the maximum value of d (the distance from the support vector to the decision boundary).
The decision boundary is wx-b=0 The distance from
any point to the boundary is
For the convenience of calculation, we call the two classes 1 and -1 respectively. Then the constraints can be obtained


because they are constants. All simplifications can be
written for convenience . Note that this is not the other
.
Since the distance from the support vector to the boundary is 1, all maximization problems are converted to min||w||, as It is convenient to rewrite the subsequent calculation as
Finally we get an optimization problem with constraints

So how to solve optimization problems with constraints?
Here we apply Lagrangian duality to find the primal problem from the duality problem.

Solve the dual problem
Find the partial derivative:


Substitute it: Obtained

by numerical calculation Then obtain the solution of the original problem from the solution
of the dual problem :

So far, we have the expression for the decision boundary in the linearly separable case.

But in real life, many situations are linearly inseparable. How to solve nonlinear problems? Here we introduce the concept of ascending dimension.

Obviously, we cannot find a linear decision boundary in the above graph. At this time, we need to introduce a third dimension, namely z = x² + y²,

so that we can use a straight line as the boundary.

However, this method of mapping to high dimensions will greatly increase the amount of calculation. In order to reduce the workload we introduce the trick of kernel function.
The kernel function is a binary function, the input is the two vectors before transformation, and its output is equal to the inner product of the two vectors after transformation. Such "coincidence" allows us to ignore the mapping and directly calculate the mapped value through the kernel function.
That is to say, to convert the problem into one that we don't need to compute in a high-dimensional space.
The kernel functions generally used are as follows:
Linear kernel function:
Polynomial kernel function:
Radial image basis kernel function/Gaussian kernel function:
Laplace kernel function:
sigmod kernel function:
As for how to choose a kernel function, in general, the linear case is Linear kernels are used, and radial image base kernels can be used when nonlinear.

Conclusion:
This is the learning experience of the slag. If there is any mistake, please correct me :)

Color map source: https://monkeylearn.com/blog/introduction-to-support-vector-machines-svm/

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325117430&siteId=291194637