Lin Xuantian's Notes on Machine Learning Techniques (2)

Lin Xuantian's Machine Learning Techniques Notes (1)
Lin Xuantian's Machine Learning Techniques' Notes (3)

Dual Support Vector Machine


P6 2.1
insert image description here
L1 talks about linear support vector machines, and then L2 talks about dual support vector machines.
insert image description here
The previous section talked about the method of finding non-linear SVM. When converting to z space, the QP problem will have d ~ + 1 variables (and N constants) to solve. To solve d ~ is very large, even infinite Let the SVM not depend on d ~ :

We can convert the original SVM into an equivalent SVM.
insert image description here
This is the dual problem:
insert image description here
we can follow the previous regularization, introduce λ, and convert the conditional problem into an unconditional problem, and the individual of λ The number is N
insert image description here
insert image description here
to define the Lagrangian function. Related literature usually writes λ as α
insert image description here
and converts SVM into the right formula.
insert image description here
If the (b,w) of st cannot be satisfied, then 1-yn(wTzn+b) is an integer. If max is selected, It will reach infinity, because in the end it will be min, which will filter out the (b,w) that does not satisfy st.
If it is satisfied, yn(wTzn+b) will be a non-negative number, because there is a max and a >=0, so yn(wTzn+b)=0 (note that ∑ \sum , because a>=0, the sum can only be equal to 0 if each item is equal to 0), then the formula is1 2 w T w \frac{1}{2} w^Tw21wT w.
In this way, the data that does not satisfy st can be effectively screened out, and the smallest1 2 w T w \frac{1}{2} w^Tw21wTw


In the previous section of P7 2.2
, the SVM was transformed into a Lagrangian formula, so how to find the lower limit of the formula? For any (b,w), there is this:
insert image description here
because it is true for any, it is still true to take the largest right-hand formula:
insert image description here
the right-hand formula becomes a Lagrangian dual (dual) problem, if this is solved The problem is to find the lower bound of the SVM.

insert image description here
Because the three conditions of green are met, it is a strong relationship (for the QP problem), so it can be directly equated, and it also shows that there are groups (b, w, α) that satisfy both sides of the equation: there are no restrictions now, so
insert image description here
start Solve this:
insert image description here
because it is min, it is required:
insert image description here
so we can add this restriction and simplify the formula:
insert image description here
it can be seen that the last item is b*0, so it becomes:
insert image description here
similarly, because of min, we need to give L Find the partial derivative of w = 0, get w to be a fixed number, and then start to simplify. Min can be ignored, because after max has the following series of regulations, there are no b and w in the formula, and the rest Only need to consider α.
insert image description here
Finally, the four conditions that satisfy the optimization are KKT. Add: the fourth point (Harry Potter and Voldemort must live one), if yn(wTzn+b)=1 (the point is just on the dividing line, these α>=0 points are SV), the formula is natural If it is 0, >1, according to the last figure in 2.1, the formula in the figure in 2.1 takes min, then αn can only take 0, so the final formula here is also 0.
insert image description here
Finally, there is a small funtime exercise to consolidate, which feels quite interesting. ②To look back at the definition of L(b,w,α), you will know that yn and zn=1, and then w= ∑ α nynzn \ sumα_ny_nz_nanynznit came out. ③It is because each item of sigma must be 0 (under KKT), so it is = 0. For the problem of α2(w-3), I feel that I can ignore the specific w, yn, and how to make zn. In short, the whole should be 0 That's it.


P8 2.3
insert image description here
Simplify the formula in the previous section, max->min, and then square it. The condition of not adding w = ... is because the cross focus is on αn. Then found that this is a convex (convex) QP problem, there are N variables (αn), and then N+1 conditions (constraint) (N αn must be greater than zero, 1 ∑ n = 1 N yn α n = 0 \ sum_{n=1}^N y_nα_n=0n=1Nynan=0 , a total of N+1), and then start to set QP.
insert image description here
Note: Generally, when inputting QP, you don’t need to split "=" into two inequalities, just write it directly, and then write the range bound directly.
insert image description here
However, note that q is a dense, dense matrix, that is, many values ​​​​in it are not non-zero, and the amount of calculation and storage is large, so a method specially designed for SVM is used.
insert image description here
Through the 4 conditions of KKT, we can introduce w and b. In particular, whenα n > 0 α_n>0an>0 ,1 − yn ∗ ( w T zn + b ) = 1 1-y_n*(w^Tz_n+b) = 11yn(wTzn+b)=1 , and =1 just means that the point is on the fat boundary of SVM (fat boundary), as for why. . It is estimated that we have to look at the hyperplane again.
insert image description here


P9 2.4
insert image description here
When we know α > 0 in the previous section, the point is on the boundary. However, the points on the classification line do not necessarily support vectors (there may be α = 0), so now the points with α>0 are called support vectors (SV), and only these SVs (that is, α>0) are studied. The scope may be narrowed down a bit.
insert image description here
Therefore, both w and b can be calculated only by SV, because if it is not SV, that is, if α = 0, they are meaningless.
insert image description here
The formulas of SVM and PLA are very similar, they are both ynzn y_nz_nynznThe linear combination of other w is similar. It can be said that w is represented by the data. The w in SVM is only represented by SV, and PLA is represented by the point where the error occurred. Philosophically, we need to know what to use to express our w.

insert image description here
Comparing the two representations of SVM: primal and dual, hard-margin means that ooxx's strict classification cannot make mistakes. Generally, Dual SVM is used.

insert image description here
Finally: Although it is said that dual svm is only related to N, d ~ is actually hidden in q. Next, we will explain how to avoid this d ~ .

The final summary:
insert image description here

Guess you like

Origin blog.csdn.net/Only_Wolfy/article/details/89505475