Personal original notes, reproduced, please attach a link. It described herein, connected linearly separable SVM + maximize hard interval (handwritten notes) , soft derived maximize interval.
- "Hard interval", is the presence of the partition hyperplane completely separate positive and negative samples.
- "Soft spacer" is not the actual data samples linearly separable, but approximately linearly separable.
Why introduce soft maximize interval?
If the data is added to a small number of noise points, while the model in order to accommodate these decisions plane noise for a change, so that even if the data is still linearly separable, but margins will be greatly reduced, although the accuracy of this model is improved, but the generalization error did rise high, it is worth the candle.
Convex quadratic optimization problem solved using Lagrange duality.
Let's write the Lagrange function, dual problem and the original problem.
The first step: First seek minimization
Respectively , and were the derivation and make them zero. The next step is to obtain the constraints, see formula ⑥ .
Step two: find a great pair ⑦
Then we solve the formula ⑨, get , and then use the express and .
Reference: Lee Hang "statistical learning methods."