It feels good to write while taking notes hhh (I feel like cutting out the picture before, writing it in a notepad and then pasting it later, it is really stupid ⁄(⁄ ⁄•⁄ω⁄•⁄ ⁄)⁄) Lin Xuantian
Machine Learning Techniques (Machine Learning Techniques) Notes (1)
Lin Xuantian Machine Learning Techniques (Machine Learning Techniques) Notes (2)
Kernel Support Vector Machine
P10 3.1
Goal: In the last class, d ~ was almost removed from dual svm. This class will discuss how to remove it completely. related to d ~ in zn T zm z_n^Tz_m
znTzmAmong them, the complexity of hard calculation is O(2 * d ~ ), transpose one O( d ~ ), and multiply after transpose another O( d ~ ). Now consider combining calculations to reduce complexity.
(For the convenience of calculation φ 2 ( x ) φ_2(x)Phi2( x ) join 1,x 1 x 2 x_1x_2x1x2Sum x 2 x 1 x_2x_1x2x1) After a series of conversions, we only use it on the x space, and use O(d) complexity calculations instead of O( d 2 d^2 on the z spaced2 ) (that is, O( d~)) complexity.
We calculate this step: transformation and inner product together, which is called Kernel function, which is expressed as: there
is another zn in b and g SVM :
in the end, we get rid of the influence of d ~ , and only need to look at SV:
P11 3.2
There are other conversion forms of quadratic polynomials:
blue and green K φ 2 Kφ_2Kφ2Although it is a bit different, they are all secondary conversions, corresponding to the same z space, these scaling will be eaten by w ~ (????????), and the things (power) are similar of. However, the defined coefficients are different, and the inner product is different, which represents different distances, so it will affect SVM. So the same space, may get different bounds (???).
Replace 1 with ζ, the following form for multiple Kernels:
at high-order, the complexity is only the inner product of xTx and a little bit of polynomial calculation, so high-order operations can be performed, and the speed will not be too slow (since the calculation is for x-space not z-space). In high-order conversions, large-margin will also control overfitting. So the two are often combined: (polynomial SVM)
Although we can use multidimensional, but K 1 K_1K1Usually our first choice, as I said before, linear usually works well.
P12 3.3
For infinitely multi-dimensional φ(x), look back at the previous section, can it be solved with Kernel?
The x-line is set to have only one dimension, and K is actually a linear combination of Gaussian functions:
the conversion of exp(2xx') uses the Taylor formula. It can be seen that the Gaussian function in the one-dimensional x hides an infinite multi-dimensional conversion. If x has many dimensions, a scaling factor γ must be added. Similarly, there are infinitely many dimensions.
After calculating α and b with K, you can get g svm , g svm is actually a linear combination of Gaussian functions centered at xn (linear combination), also called RBF (radial basis function), the R in it: Radial is For a function from a certain center to the outside like Gaussian, BF is used for linear combination.
Summary: large-margin can make the number of hypothesis not very large, Kernel greatly reduces the amount of calculation of dual SVM, so that it can create very complex boundaries (although lower dimensions can sometimes do well), and then Gaussian can make SVM computes an infinite-dimensional gSVM .
Finally, we should pay attention to the value of γ, if it is too large, it will overfit, even though there will be large - margin protection.
P13 3.4
Let’s compare the pros and cons of the three Kernels:
polynomial kernel, Q can be set flexibly, but because the calculation of the Q power may be very large or small, so Q should be relatively small (numerical difficult) and there are 3 parameters To choose.
Gaussian is commonly used, but I don’t know exactly how this is separated (because it has infinite dimensions):
Of course, this custom kernel must satisfy: ① Symmetry ② Satisfy the semi-proof theorem (: ZTZ > 0 Z^TZ>0ZTZ>0 )
Final summary: