6.0 Background
Support Vector Machines | flexible | Ability (arbitrary precision approximate any continuous function of the angle) | Mathematical theory of the firm | Global optimal solution | Without manual parameter adjustment | Large computational overhead (relative) | Field support in a difficult | The scientific community service |
Neural Networks | flexible | strong ability | The theory is unclear, from the cognitive | Local optima | Reliance on manual parameter adjustment | Big or small | Field support everywhere | Service Industry |
6.1 spacing and support vector
-
Interval (margin): Select the "middle", tolerance is good, high robustness, generalization of the strongest
-
Generalization: the ability to predict future data
-
SVM (support vector): Distance hyperplane last few points (positive samples and negative samples)
-
Maximum interval: the shortest distance points to the straight line = 1 w (reciprocal slope) /
-
What is a convex function: y = x ^ 2 (the second derivative is positive), we must have a convex optimization globally optimal solution
6.2 pairs of dual problem
-
Lagrange multiplier method: high-dimensional function, a constraint condition reduced
-
Sparsity of the solution: KKT conditions
-
-
Determining w, only the number of support vectors and relevant
-
-
mosek Tools
-
(SMO)
6.3 kernel
- Linearly inseparable: l-dimensional, linear classifier to establish a high-dimensional space
- Mercer Theorem (full unnecessary): as long as the function corresponding to a symmetric kernel matrix Semidefinite (all non-negative formula master), it can be used as the kernel function
6.4 soft spacing and regularization
- 0/1 loss function: taking a compromise between the spacer and the loss
- Problems: 0/1 loss function non-convex, non-continuous, easy optimization
- Alternative losses: generally the upper bound of 0/1 loss function
- Regularization
- Logarithmic regression chance
- The minimum absolute contraction selection operator (LASSO)
6.5 Support Vector Regression
-
The larger dot spacing mistakes, the fewer the number
-
Loss function
-
Quadratic programming: The goal is a quadratic function, the constraint is a linear function
6.6 Kernel Methods
- Nuclear SVM
- Kernel PCA
- Nuclear LDA
- Reproducing kernel Hilbert space (Reproducing Kernel Hilbert Space)