Machine Learning | watermelon book study notes ch06: SVM

6.0 Background

Support Vector Machines flexible Ability (arbitrary precision approximate any continuous function of the angle) Mathematical theory of the firm Global optimal solution Without manual parameter adjustment Large computational overhead (relative) Field support in a difficult The scientific community service
Neural Networks flexible strong ability The theory is unclear, from the cognitive Local optima Reliance on manual parameter adjustment Big or small Field support everywhere Service Industry

6.1 spacing and support vector

  • Interval (margin): Select the "middle", tolerance is good, high robustness, generalization of the strongest

  • Generalization: the ability to predict future data

  • SVM (support vector): Distance hyperplane last few points (positive samples and negative samples)

  • Maximum interval: the shortest distance points to the straight line = 1 w (reciprocal slope) /
    a r g   m a x w , b 2 w s . t . y i ( w T x i + b ) 1 , i = 1 , 2 , . . . , m . a r g   m i n w , b 1 2 w 2 s . t . y i ( w T x i + b ) 1 , i = 1 , 2 , . . . , m . arg \ max_{w,b} \frac{2}{||w||}\\ s.t. y_i(w^Tx_i+b) \geq 1,i = 1,2,...,m.\\ 等价于↓\\ arg \ min_{w,b} \frac{1}{2}||w||^2\\ s.t. y_i(w^Tx_i+b) \geq 1,i=1,2,...,m.

  • What is a convex function: y = x ^ 2 (the second derivative is positive), we must have a convex optimization globally optimal solution

6.2 pairs of dual problem

  • Lagrange multiplier method: high-dimensional function, a constraint condition reduced

  • Sparsity of the solution: KKT conditions

    • { α i 0 , y i f ( x i ) 1 , α i ( y i f ( x i ) 1 ) = 0. α i = 0     y i f ( x i ) = 1 \begin{cases} \alpha_i\geq 0,\\ y_if(x_i) \geq 1, \\ \alpha_i(y_if(x_i)-1) = 0. \end{cases}\\ 必有\alpha_i=0 \ 或 \ y_if(x_i)=1

    • Determining w, only the number of support vectors and relevant

  • mosek Tools

  • (SMO)

6.3 kernel

  • Linearly inseparable: l-dimensional, linear classifier to establish a high-dimensional space
  • Mercer Theorem (full unnecessary): as long as the function corresponding to a symmetric kernel matrix Semidefinite (all non-negative formula master), it can be used as the kernel function

6.4 soft spacing and regularization

  • 0/1 loss function: taking a compromise between the spacer and the loss
  • Problems: 0/1 loss function non-convex, non-continuous, easy optimization
  • Alternative losses: generally the upper bound of 0/1 loss function
  • Regularization
    • Logarithmic regression chance
    • The minimum absolute contraction selection operator (LASSO)

6.5 Support Vector Regression

  • The larger dot spacing mistakes, the fewer the number

  • Loss function

  • Quadratic programming: The goal is a quadratic function, the constraint is a linear function

6.6 Kernel Methods

  • Nuclear SVM
  • Kernel PCA
  • Nuclear LDA
  • Reproducing kernel Hilbert space (Reproducing Kernel Hilbert Space)
Published 298 original articles · won praise 391 · Views 250,000 +

Guess you like

Origin blog.csdn.net/Wonz5130/article/details/104118440