The kernel function, the principle is relatively simple to understand
But there are still some things I don't understand
For non-linearly separable data, in the current dimension, it is indistinguishable to directly use SVM
Then you can directly upgrade from the current dimension to a higher dimension for calculation.
For example, the original data only has 3 influencing factors xa, xb, xc x_a, x_b, x_cxa,xb,xc(gender) (age) (appearance), then based on the data of these three dimensions, it can be expanded into a higher dimension (more unnamed influencing factors), such as xc, xd, xe, xf, xg, and then substituted into Computing in the dual problem of SVM
In fact, there is a book "Mathematical Foundations of Artificial Intelligence", the description is quite clear
In addition, when talking about the Gaussian kernel function, the derivation is also given
The derivation here is actually to confirm how the Gaussian kernel function is expanded into a high-dimensional vector, and how the dot product of the high-dimensional vector is calculated using the current low-dimensional data.
K ( x , y ) = e − ∣ ∣ x − y ∣ ∣ 2 = e − x 2 e − y 2 [ 1 + ( 2 x . y ) 1 1 ! + ( 2 x . y ) 2 2 ! + ( 2 x . y ) 3 3 ! . . . ] K(x,y) = e^{-||xy||²}=e^{-x^2}e^{-y^2}[1+\frac{(2x.y)^1} {1!}+\frac{(2x.y)^2}{2!}+\frac{(2x.y)^3}{3!}...]K(x,y)=e−∣∣x−y∣∣2=e−x2e _−y2[1+1!(2x.y)1+2!(2x.y)2+3!(2x.y)3...]
Here, the dot product of high-dimensional vectors is reflected in e − x 2 e − y 2 [ 1 + ( 2 x . y ) 1 1 ! + ( 2 x . y ) 2 2 ! + ( 2 x . y ) 3 3 ! . . . ] e^{-x^2}e^{-y^2}[1+\frac{(2x.y)^1}{1!}+\frac{(2x.y)^2} {2!}+\frac{(2x.y)^3}{3!}...]e−x2e _−y2[1+1!(2x.y)1+2!(2x.y)2+3!(2x.y)3...]
In the SVM dual function solution, it is necessary to solve the dot product
Suppose the original data x is divided into 3 influencing factors xa, xb, xc x_a, x_b, x_cxa,xb,xc
Then the two data are ( xa 1 , xb 1 , xc 1 ) (x_{a1}, x_{b1}, x_{c1})(xa 1,xb 1,xc 1), ( x a 2 , x b 2 , x c 2 ) (x_{a2},x_{b2},x_{c2}) (xa 2,xb 2,xc2)
Then their dot product is xa 1 xa 2 + xb 1 xb 2 + xc 1 xc 2 x_{a1}x_{a2}+x_{b1}x_{b2}+x_{c1}x_{c2}xa 1xa 2+xb 1xb 2+xc 1xc2
But if, now, the three influencing factors are raised to infinite influencing factors through a certain relationship, xe , xf , xg , xh , xk x_e,x_f,x_g,x_h,x_kxe,xf,xg,xh,xk
These two data become
xe 1 , xf 1 , xg 1 , xh 1 , xk 1 . . . x_{e1},x_{f1},x_{g1},x_{h1},x_{k1}.. .xand 1,xf 1,xg 1,xh 1,xk 1...
x e 2 , x f 2 , x g 2 , x h 2 , x k 2 . . . x_{e2},x_{f2},x_{g2},x_{h2},x_{k2}... xe2 _,xf2 _,xg2,xh 2,xk2 _...
Then their dot product is
xe 1 xe 2 + xf 1 xf 2 + xg 1 + xh 1 xh 2 + xk 1 xk 2 + . . . x_{e1}x_{e2}+x_{f1}x_{f2}+ x_{g1}+x_{h1}x_{h2}+x_{k1}x_{k2}+...xand 1xe2 _+xf 1xf2 _+xg 1+xh 1xh 2+xk 1xk2 _+...
However, because the high-dimensional dot product calculation is too large, find an original low-dimensional calculation formula that is equal to the high-dimensional dot product result, and that calculation formula is the kernel function!
e − x 2 e − y 2 in the Gaussian kernel function [ 1 + ( 2 x . y ) 1 1 ! + ( 2 x . y ) 2 2 ! + ( 2 x . y ) 3 3 ! . . . ] e ^{-x^2}e^{-y^2}[1+\frac{(2x.y)^1}{1!}+\frac{(2x.y)^2}{2!}+ \frac{(2x.y)^3}{3!}...]e−x2e _−y2[1+1!(2x.y)1+2!(2x.y)2+3!(2x.y)3... ] The dot product of two infinite-dimensional data, the corresponding formula ise − ∣ ∣ x − y ∣ ∣ 2 e^{-||xy||²}e−∣∣x−y∣∣2 , which is exactly the kernel function
It's just that the Gaussian kernel function also has a parameter σ, also called the kernel radius
e − ∣ ∣ x − y ∣ ∣ 2 2 σ e^{\frac{-||xy||²}{2σ}}e2 p−∣∣x−y∣∣2, when the difference between the two data of x and y is very small and very close, the value of the kernel function is close to 1, and
when the difference between the two data of x and y is very large, the value of the kernel function is close to 0
This is like a normal distribution (Gaussian distribution)
Then the role of σ is for adjustment:
In extreme terms, when the value of σ is very, very, very, very large, if the data gap between x and y is relatively large, the value of the kernel function is still close to 1
Therefore, the kernel function is a low-dimensional and high-dimensional calculation formula for solving linear relationships and replacing high-dimensional dot products with low-dimensional data calculations.