"A brief description of the commonly used kernel functions of SVM support vector machines and their meaning | CSDN creation punch card"

        In the previous blog I posted, I used support vector machines to implement a simple binary classification problem. The specific program can be found in a blog I posted before, but if only support vector machines are used to implement a linear classification, then As the saying goes, you can't kill a chicken with a sledgehammer. It can also be said that the kernel function is the soul of the support vector machine, because in practical problems, there may not be a hyperplane or decision boundary that can correctly divide the two types of samples in the original sample space, just as the classification problem shown in the figure below , it is obviously impossible to classify successfully with the method introduced in my previous article.

For such a problem, by reading the watermelon book, we can know that the sample can be mapped from the original space to a higher-dimensional feature space, so that the sample is linearly separable in this feature space. For example, take the above question as an example. At this time, it is obvious that the two samples cannot be distinguished by using a straight line. But if it is mapped to a suitable three-dimensional space, then a suitable dividing hyperplane can be found. Then here is a conclusion that can be remembered. If the original space is finite-dimensional, that is, the attributes are limited, then there must be a high-dimensional feature space that makes the samples separable.

        There is such a theorem in the introduction of support vector machines:

The above theorem shows that as long as the kernel matrix corresponding to a symmetric function is positive semi-definite, it can be used as a kernel function.

        The following is a brief introduction to several commonly used kernel functions

Linear kernel:k(x_{i}, x_{j}) = x_{i}^{T}x_{j} 

Polynomial kernel: k(x_{i}, x_{j}) = (x_{i}^{T}x_{j})^{d}where d>=1 is the degree of polynomial

Gaussian kernel: k(x_{i},x_{j})=exp(-\frac{||x_{i}-x_{j}||^{2}}{2\sigma ^{2}}), σ>0 is the bandwidth of the Gaussian kernel (width)

Laplace kernel: k(x_{i},x_{j})=exp(-\frac{||x_{i}-x_{j}||}{\sigma }), σ>0

Sigmoid kernel: k(x_{i}, x_{j}) = \tanh(\beta x_{i}^{T}x_{j} + \theta), tanh is a hyperbolic tangent function, β>0, θ is less than 0

At the same time, the kernel function also has a series of characteristics:
the linear combination of two kernel functions is still a kernel function

The direct product of two kernel functions is still the kernel function

If k is a kernel function, then for an arbitrary function g(x), k(x,z) = g(x)k(x,z)g(z) is also a kernel function.

I will introduce examples of kernel function applications in a follow-up blog.

Guess you like

Origin blog.csdn.net/kuwola/article/details/122756683