Lin Xuantian's Notes on Machine Learning Techniques (3)

It feels good to write while taking notes hhh (I feel like cutting out the picture before, writing it in a notepad and then pasting it later, it is really stupid ⁄(⁄ ⁄•⁄ω⁄•⁄ ⁄)⁄) Lin Xuantian
Machine Learning Techniques (Machine Learning Techniques) Notes (1)
Lin Xuantian Machine Learning Techniques (Machine Learning Techniques) Notes (2)

Kernel Support Vector Machine


P10 3.1
Goal: In the last class, d ~ was almost removed from dual svm. This class will discuss how to remove it completely. related to d ~ in zn T zm z_n^Tz_m
insert image description here
znTzmAmong them, the complexity of hard calculation is O(2 * d ~ ), transpose one O( d ~ ), and multiply after transpose another O( d ~ ). Now consider combining calculations to reduce complexity.
insert image description here
(For the convenience of calculation φ 2 ( x ) φ_2(x)Phi2( x ) join 1,x 1 x 2 x_1x_2x1x2Sum x 2 x 1 x_2x_1x2x1) After a series of conversions, we only use it on the x space, and use O(d) complexity calculations instead of O( d 2 d^2 on the z spaced2 ) (that is, O( d~)) complexity.

We calculate this step: transformation and inner product together, which is called Kernel function, which is expressed as: there
insert image description here
is another zn in b and g SVM :
insert image description here
in the end, we get rid of the influence of d ~ , and only need to look at SV:
insert image description here


P11 3.2
There are other conversion forms of quadratic polynomials:
insert image description here
blue and green K φ 2 Kφ_2Kφ2Although it is a bit different, they are all secondary conversions, corresponding to the same z space, these scaling will be eaten by w ~ (????????), and the things (power) are similar of. However, the defined coefficients are different, and the inner product is different, which represents different distances, so it will affect SVM. So the same space, may get different bounds (???).
insert image description here
Replace 1 with ζ, the following form for multiple Kernels:
insert image description here
at high-order, the complexity is only the inner product of xTx and a little bit of polynomial calculation, so high-order operations can be performed, and the speed will not be too slow (since the calculation is for x-space not z-space). In high-order conversions, large-margin will also control overfitting. So the two are often combined: (polynomial SVM)
insert image description here
Although we can use multidimensional, but K 1 K_1K1Usually our first choice, as I said before, linear usually works well.
insert image description here


P12 3.3
For infinitely multi-dimensional φ(x), look back at the previous section, can it be solved with Kernel?
The x-line is set to have only one dimension, and K is actually a linear combination of Gaussian functions:
insert image description here
the conversion of exp(2xx') uses the Taylor formula. It can be seen that the Gaussian function in the one-dimensional x hides an infinite multi-dimensional conversion. If x has many dimensions, a scaling factor γ must be added. Similarly, there are infinitely many dimensions.

After calculating α and b with K, you can get g svm , g svm is actually a linear combination of Gaussian functions centered at xn (linear combination), also called RBF (radial basis function), the R in it: Radial is For a function from a certain center to the outside like Gaussian, BF is used for linear combination.
insert image description here
Summary: large-margin can make the number of hypothesis not very large, Kernel greatly reduces the amount of calculation of dual SVM, so that it can create very complex boundaries (although lower dimensions can sometimes do well), and then Gaussian can make SVM computes an infinite-dimensional gSVM .
insert image description here
Finally, we should pay attention to the value of γ, if it is too large, it will overfit, even though there will be large - margin protection.


P13 3.4
Let’s compare the pros and cons of the three Kernels:
insert image description here
polynomial kernel, Q can be set flexibly, but because the calculation of the Q power may be very large or small, so Q should be relatively small (numerical difficult) and there are 3 parameters To choose.
insert image description here
Gaussian is commonly used, but I don’t know exactly how this is separated (because it has infinite dimensions):
insert image description here
Of course, this custom kernel must satisfy: ① Symmetry ② Satisfy the semi-proof theorem (: ZTZ > 0 Z^TZ>0ZTZ>0 )
insert image description here
Final summary:
insert image description here

Guess you like

Origin blog.csdn.net/Only_Wolfy/article/details/89517676