Machine learning - SVM kernel function

The kernel function, the principle is relatively simple to understand

But there are still some things I don't understand

For non-linearly separable data, in the current dimension, it is indistinguishable to directly use SVM

Then you can directly upgrade from the current dimension to a higher dimension for calculation.

For example, the original data only has 3 influencing factors xa, xb, xc x_a, x_b, x_cxa,xb,xc(gender) (age) (appearance), then based on the data of these three dimensions, it can be expanded into a higher dimension (more unnamed influencing factors), such as xc, xd, xe, xf, xg, and then substituted into Computing in the dual problem of SVM

In fact, there is a book "Mathematical Foundations of Artificial Intelligence", the description is quite clear
insert image description here

In addition, when talking about the Gaussian kernel function, the derivation is also given

insert image description here

The derivation here is actually to confirm how the Gaussian kernel function is expanded into a high-dimensional vector, and how the dot product of the high-dimensional vector is calculated using the current low-dimensional data.

K ( x , y ) = e − ∣ ∣ x − y ∣ ∣ 2 = e − x 2 e − y 2 [ 1 + ( 2 x . y ) 1 1 ! + ( 2 x . y ) 2 2 ! + ( 2 x . y ) 3 3 ! . . . ] K(x,y) = e^{-||xy||²}=e^{-x^2}e^{-y^2}[1+\frac{(2x.y)^1} {1!}+\frac{(2x.y)^2}{2!}+\frac{(2x.y)^3}{3!}...]K(x,y)=e∣∣xy2=ex2e _y2[1+1!(2x.y)1+2!(2x.y)2+3!(2x.y)3...]

Here, the dot product of high-dimensional vectors is reflected in e − x 2 e − y 2 [ 1 + ( 2 x . y ) 1 1 ! + ( 2 x . y ) 2 2 ! + ( 2 x . y ) 3 3 ! . . . ] e^{-x^2}e^{-y^2}[1+\frac{(2x.y)^1}{1!}+\frac{(2x.y)^2} {2!}+\frac{(2x.y)^3}{3!}...]ex2e _y2[1+1!(2x.y)1+2!(2x.y)2+3!(2x.y)3...]

In the SVM dual function solution, it is necessary to solve the dot product

Suppose the original data x is divided into 3 influencing factors xa, xb, xc x_a, x_b, x_cxa,xb,xc

Then the two data are ( xa 1 , xb 1 , xc 1 ) (x_{a1}, x_{b1}, x_{c1})(xa 1xb 1xc 1), ( x a 2 , x b 2 , x c 2 ) (x_{a2},x_{b2},x_{c2}) (xa 2xb 2xc2)

Then their dot product is xa 1 xa 2 + xb 1 xb 2 + xc 1 xc 2 x_{a1}x_{a2}+x_{b1}x_{b2}+x_{c1}x_{c2}xa 1xa 2+xb 1xb 2+xc 1xc2

But if, now, the three influencing factors are raised to infinite influencing factors through a certain relationship, xe , xf , xg , xh , xk x_e,x_f,x_g,x_h,x_kxe,xf,xg,xh,xk

These two data become
xe 1 , xf 1 , xg 1 , xh 1 , xk 1 . . . x_{e1},x_{f1},x_{g1},x_{h1},x_{k1}.. .xand 1,xf 1,xg 1,xh 1,xk 1...
x e 2 , x f 2 , x g 2 , x h 2 , x k 2 . . . x_{e2},x_{f2},x_{g2},x_{h2},x_{k2}... xe2 _,xf2 _,xg2,xh 2,xk2 _...

Then their dot product is
xe 1 xe 2 + xf 1 xf 2 + xg 1 + xh 1 xh 2 + xk 1 xk 2 + . . . x_{e1}x_{e2}+x_{f1}x_{f2}+ x_{g1}+x_{h1}x_{h2}+x_{k1}x_{k2}+...xand 1xe2 _+xf 1xf2 _+xg 1+xh 1xh 2+xk 1xk2 _+...

However, because the high-dimensional dot product calculation is too large, find an original low-dimensional calculation formula that is equal to the high-dimensional dot product result, and that calculation formula is the kernel function!

e − x 2 e − y 2 in the Gaussian kernel function [ 1 + ( 2 x . y ) 1 1 ! + ( 2 x . y ) 2 2 ! + ( 2 x . y ) 3 3 ! . . . ] e ^{-x^2}e^{-y^2}[1+\frac{(2x.y)^1}{1!}+\frac{(2x.y)^2}{2!}+ \frac{(2x.y)^3}{3!}...]ex2e _y2[1+1!(2x.y)1+2!(2x.y)2+3!(2x.y)3... ] The dot product of two infinite-dimensional data, the corresponding formula ise − ∣ ∣ x − y ∣ ∣ 2 e^{-||xy||²}e∣∣xy2 , which is exactly the kernel function

It's just that the Gaussian kernel function also has a parameter σ, also called the kernel radius

e − ∣ ∣ x − y ∣ ∣ 2 2 σ e^{\frac{-||xy||²}{2σ}}e2 p∣∣xy2, when the difference between the two data of x and y is very small and very close, the value of the kernel function is close to 1, and
when the difference between the two data of x and y is very large, the value of the kernel function is close to 0

This is like a normal distribution (Gaussian distribution)

Then the role of σ is for adjustment:

In extreme terms, when the value of σ is very, very, very, very large, if the data gap between x and y is relatively large, the value of the kernel function is still close to 1

Therefore, the kernel function is a low-dimensional and high-dimensional calculation formula for solving linear relationships and replacing high-dimensional dot products with low-dimensional data calculations.

Guess you like

Origin blog.csdn.net/weixin_50348308/article/details/132136012