Parameters of SVM kernel function RBF

This article is translated from "RBF SVM parameters" .

This example will illustrate the influence of the parameters gamma and C when the radial basis function (RBF) is used as the kernel function of the SVM.

Intuitively, the parameter gamma defines the influence size of a single training sample, the smaller the value, the greater the influence, and the larger the value, the smaller the influence. The parameter gamma can be seen as the inverse of the radius of influence of the samples selected by the model as support vectors.

The parameter C trades off between misclassified samples and interface simplicity. A low C value smoothes the interface, while a high C value ensures that all samples are correctly classified by increasing the model degrees of freedom to select more support vectors.

Figure 1 is a visualization of the decision function for a simple classification problem with only two input features and two possible target classifications (binary classification) when taking different parameter values. Note that when there are more features and object classifications this kind of graph cannot be drawn.

figure 1

Figure 2 is a heatmap of classifier cross-validation accuracy as a function of C and gamma. For demonstration purposes in this example, we explore a relatively large range of parameters. In practice, a logarithmic range of 10 −3 to 10 3 is generally sufficient. If the best parameter is at the boundary of the range, further searches can be done by extending the range in that direction.

figure 2

Notice that there is a special color bar in the heatmap with a value in the middle that is close to the model's best-performing score, which can be seen at a glance.

The behavior of the model is very sensitive to the parameter gamma. If the parameter gamma is too large, the influence radius of the support vector will be so small that it can only affect itself. At this time, no matter how much you adjust the parameter C, you cannot avoid overfitting.

When the parameter gamma is very small, the model is too constrained to capture the complexity or "shape" of the data. The region of influence of any selected support vector will encompass the entire training set. The result of the model will behave like a linear model of high-density centers of two or more classes partitioned by a set of hyperplanes.

As for the intermediate values, we can see in Figure 2 that good models can be found on the diagonal of the parameters gamma and C. A smoother model (smaller gamma value) can obtain higher complexity by choosing a larger number of support vectors (larger C value), so a good model appears on the diagonal.

Finally, we also observe that for some intermediate values ​​of gamma, very large values ​​of C can still lead to well-behaved models: there is no need to limit the number of support vectors to achieve regularization. The radius of the RBF kernel itself is a good structural modifier. In practice it may still be of interest to limit the number of support vectors by a small value of C, so that the model uses less memory and makes predictions faster.

We should also point out that cross-validation with random splits can result in slightly different scores. This subtle difference can be smoothed out by increasing the number of CV iterations n_splits at the cost of computation time. Increasing the step size of parameters C and gamma in the heatmap will reduce the resolution of the heatmap.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324578513&siteId=291194637