Lin Xuantian's Notes on Machine Learning Techniques (1)

Finally arrived at machine learning techniques , and then try to keep each chapter completed and update it immediately. . Cornerstone did not insist on finishing the writing, but now I look back and don’t know what I was writing. Looking at the notes, I feel that the writing is a mess, and I feel like I’ve overturned. Improve slowly.
I heard that the technique is quite difficult, so post a post on the blog of the master to bless it:
Red Stone: I think it sums it up very well! !

Lin Xuantian's Machine Learning Techniques Notes (2)
Lin Xuantian's Machine Learning Techniques' Notes (3)

1. Linear SVM

P1 1.1
After introducing the [techniques] around three feature transforms after this course
1. How to use feature transforms and control the complexity of feature transforms: use SVM (Support Vector Machine, it sounds quite difficult)
2 .How to find predictive features and mix them together to make the model perform better: AdaBoost (stepwise enhancement method)
3. How to find and learn hidden features to make the machine perform better: Deep Learning (deep learning!!!)


P2 1.2
insert image description here
In PLA, we can actually have different divisions for a set of data. The above three pictures are all "correct": it is guaranteed that all points are divided correctly, and according to VC bound, Eout is the same,
insert image description here
but according to the human brain, the division of the rightmost picture must be better.
why? Because the data will have some noise or measurement error, the actual situation is not necessarily on ooxx, it may be distributed in the gray area, and it is also reasonable. If it is on the left picture, close to the x on the dividing line, if there is some vibration, it will be easier to run to the range of o, resulting in errors. Therefore, in order to improve the error tolerance rate (the ability to tolerate errors) (the legendary robustness?), it is necessary to call out a "stronger" line. Obviously, the strongest line is to ensure that everything is correct . The line that is farthest from the nearest point.
insert image description here
insert image description here
Of course, it can also be transformed into "fat" but not "fat", and the fatter the thread, the stronger it is. Academically, "fat" is called margin. The following is a formula to express the w that maximizes the margin: " The strongest line is the line that is the farthest from the nearest point when everything is guaranteed to be correct "
insert image description here


P3 1.3
insert image description here
began to find distance(xn,w). Previously, a w0 was added to w1~wd, but because this w0 is different from other w operations, it jumped out directly, which is b, so there is: (
utility bills utility bills
here The w0(b) should be a bias item, for why there is a bias item, you have to read the watermelon book for details)

Next, find distance(x,b,w), x' and x'' are points on the plane, x is a data point (not necessarily on the hyperplane), according to wTx' + b = 0, there is wTx' = -b, the same reason: wTx'' = -b
insert image description here
There is a special place here, which is to prove that w is the normal vector of this hyperplane. (About the hyperplane, I read someone else's article , but he didn't seem to explain why w is a normal vector..)
Knowing the normal vector, if there is a point x' on the plane, the distance between x and x' is actually the vector xx 'The projection on w, so it is:
insert image description here
because this is a Hard-Margin SVM, so this line will be divided into pairs for all points, so there is: insert image description here
and yn=±1, so you can take off the absolute value:
insert image description here
then Come down for the convenience of solving:
Definition: insert image description here
Then there is: insert image description here
For why it is 1, in fact, any constant is fine. Here, the barrage says that it involves the knowledge of functional intervals and geometric intervals ? ? . Look at the red stone and say that w and b are scaled at the same time, and the obtained plane is still the same, so you can control yn ( w 1 T xn + b 1 ) = 1 y_n(w1^Tx_n+b1)=1yn(w1Txn+b 1 )=1 (Oh O o??)
At this time, because the largest margin is required (to make the line wider), it is necessary to make w larger and satisfymin ( n = 1... N ) yn ( w 1 T xn + b 1 ) = 1 min_(n=1...N) y_n(w1^Tx_n+b1)=1min(n=1 . . . N ) andn(w1Txn+b 1 )=1

But it is still difficult to solve, so we relax the conditions, let yn ( w T xn + b 1 ) > = 1 y_n(w^Tx_n+b1)>=1yn(wTxn+b 1 )>=1 , and prove that after relaxation, the best solution or h will satisfyyn ( w T xn + b 1 ) = 1 y_n(w^Tx_n+b1)=1yn(wTxn+b 1 )=1
Assume to find a set of optimal solutions (b1,w1) such thatyn ( w 1 T xn + b 1 ) > 1.126 y_n(w1^Tx_n+b1)>1.126yn(w1Txn+b 1 )>1 . 1 2 6 , then we can also find a set of better solutions (b 1 1.126 \frac{b1}{1.126}1.126b 1 w 1 1.126 \frac{w1}{1.126} 1.126w1), according to margin = 1 ∣ ∣ w ∣ ∣ margin=\frac{1}{||w||}margin=w1, w/1.126 becomes smaller, so that the margin is larger. Therefore, the previous optimal solution (b1, w1) is not optimal, and there is a contradiction. So as long as there is a group solution such that yn ( w T xn + b 1 ) > 1 y_n(w^Tx_n+b1)>1yn(wTxn+b 1 )>1 , we can find a better solution such thatyn ( w T xn + b 1 ) = 1 y_n(w^Tx_n+b1)=1yn(wTxn+b 1 )=1 , so we know that the optimal solution would beyn ( w T xn + b 1 ) = 1 y_n(w^Tx_n+b1)=1yn(wTxn+b 1 )=1

Finally, I used to seek min before. In order to unify, put 1 ∣ ∣ w ∣ ∣ \frac{1}{||w||}w1Take the inverse. Find max 1 ∣ ∣ w ∣ ∣ max\frac{1}{||w||}maxw1Change to min ∣ ∣ w ∣ ∣ min||w||m i n w . Because ||w|| has a root sign, so remove the root sign and become the square of w, expressed in a matrix is ​​wTw, and finally add1 2 \frac{1}{2}21(It feels like it was added for derivation??). Finally becomes:
insert image description here
the final funtime, note that the formula x1x2 can correspond to x and y in y=kx+b respectively. Then according to d = ∣ A x 1 + B x 2 + C ∣ ( A 2 + B 2 ) d=\frac{|Ax1+Bx2+C|}{\sqrt{(A^2+ B^2)}}d=(A2+B2) Ax1+Bx2+C, simplify x 1 + x 2 = 1 x1+x2=1x 1+x2 _=1 is1 ∗ x 1 + 1 ∗ x 2 − 1 = 0 1*x1+1*x2-1=01x 1+1x2 _1=0 , thenA = 1 , B = 1 , C = − 1 A=1,B=1,C=-1A=1,B=1,C=1 , substitute x1 and x2 of x1 (actually x and y of x1), which is as follows:
insert image description here


P4 1.4
insert image description here
Taking this group (X, Y) as an example, (i)~(iv) can be obtained, then it can be determined that w1>=1, w2<=-1, so w1^2 + w2^2 >=2 , so there is 1 2 w T w > = 1 \frac{1}{2}w^Tw>=121wTw>=1 , assign appropriate values ​​to w1, w2 and b, then g svm= sign ( x1 - x2 - 1 )is obtained
So, how to deal with the general case? Solve this problem:
insert image description hereit has two characteristics:
insert image description here
quadratic programming (quadratic programming/convex optimization/is a QP problem) already has a known solution, and then only substitution is enough: finally,
insert image description here
for non-linear The problem, just use the z space before
insert image description here


The difference between P5 1.5
SVM and the previous regularization (z space or something) is called contact:
insert image description here
It can be seen that the goals of the two are almost opposite, so SVM is also a kind of regularization, but let Ein=0.
insert image description here
When margin is set to 0 ( A 0 A_0A0), same as PLA. When the width is A 1.126 , if it does not meet the rules, do not choose it, it will be higher than A 0 A_0A0There are fewer types, so there are fewer situations -> (false) VC dimension is less -> better generalization.
insert image description here
For this sphere ρ = 0 ρ=0r=0 can shatter 3 points, so dvc= 3, ifρ = 3 2 ρ=\frac{\sqrt{3}}{2}r=23 If , the radius of this circle is 3 \sqrt{3}3 , because there are three points, at most one pair is on the opposite side, and there is another point that cannot be shattered, so d vc < 3 at this time. So there is:
insert image description here
the next lesson will introduce non-linear SVM that combines large-margin hyperplanes and feature transformation:
insert image description here

Guess you like

Origin blog.csdn.net/Only_Wolfy/article/details/89470194