031. SVM Support Vector Machine

Cost function and principle

As a two-class classification model, the difference from logistic regression:
Insert picture description here
Insert picture description here

  • Hyperplane and interval:

    Insert picture description here
    Insert picture description here

    From the point-to-plane distance formula and the related knowledge of the plane formula, it can be known that the positive, negative and absolute value of wx+b can indicate the classification prediction result and the relative prediction confidence.

    SVM takes -1 and +1 for classification: it is mainly to facilitate calculation and fit the hyperplane classification. (As long as there are two differences, they are equivalent to classification)

    Insert picture description here
    (Here, let the function interval be 1, which is convenient for calculation.)
    ~Only the support vector works when determining the classification hyperplane. Moving outside the interval boundary or removing other points of the instance will not change the solution.

Insert picture description here

  • How to solve the above constrained optimization problem?

    Answer: Use Lagrangian multiplier method, Lagrangian duality and KKT condition.
    (I don't need to expand here... Xiaobai's summary is not good)

  • Add soft interval (using slack variables, instead of 0/1 loss function):

    Insert picture description here
    Insert picture description here
    Insert picture description here
    Insert picture description here
    Insert picture description here

    Insert picture description here
    (1. Hinge loss function is used here; 2. Pay attention to not just, similar to "safe distance")

Image understanding

Insert picture description here
Large spacing classifier: Robust.
Insert picture description here

Large interval classifier

Insert picture description here
(Can be regarded as the projection of x(i) on θ)
Insert picture description here

Kernel function

In addition to trying to use nonlinear equations to establish complex decision-making boundaries, the kernel function is also a feasible method.

Insert picture description here

  • Gaussian kernel:
    Insert picture description here
    Insert picture description here
    σ is the parameter of the Gaussian kernel function.

    Insert picture description here
    example:
    Insert picture description here

    • Other kernel functions:

      Insert picture description here

      Polynomial kernel functions
      Insert picture description here
      and string sum functions, chi-square kernel functions, histogram intersection kernel functions, etc.

Select marker points and other questions

Sample method example:
Insert picture description here
After that is the learning and training using the kernel function:
Insert picture description here
parameter problem:
Insert picture description here
applied to two commonly used kernel functions: (linear kernel function: function without kernel parameters)
Insert picture description here
Note that the data should be normalized before using the Gaussian kernel function!

Multi-category

Insert picture description here

Model selection

Insert picture description here

Logistic regression and linear kernel function: if there are more features, you can consider the latter

The third case: using Gaussian kernel computing speed will be slow

A good software package can generally obtain the global optimal solution (convex optimization) without worrying about the local optimal problem.

Guess you like

Origin blog.csdn.net/u013598957/article/details/107752174