CHAPTER 5——Support Vector Machines

svm can be used for linear or nonlinear classification, regression, and even anomaly detection.

svm is especially suitable for complex classification problems with small and medium datasets.

5.1 Linear SVM Classification

svm is sensitive to feature scales, as shown in the figure below, the ordinate range in the left figure is much larger than the abscissa range, and the decision boundary will be biased towards the abscissa. After feature scaling in the image on the right, the decision boundary is much better.

5.2 Soft Margin Classification

Scikit-Learn's SVM class has a C hyperparameter. A smaller C results in a wider interval but more misclassification points. As shown in the image below, the right C is smaller. The smaller the C, the easier the model is to generalize. If the SVM is overfitting, try reducing C to adjust.

 

Scikit-Learn provides LinearSVC class and SVC class, but the latter will be much slower and is not recommended due to large training sets. In addition, you can use SGDClassifier(loss="hinge",alpha=1/(m*C)), which will use the SGD algorithm to train a linear SVM classifier, which does not converge as fast as LinearSVC, but can handle massive data sets or online classification task.

The LinearSVC class regularizes the bias term, so you should center the training set first by subtracting its mean. This is done automatically when StandardScaler is called. Also set the loss hyperparameter to "hinge" and the dual hyperparameter to False (unless there are more features than samples).

5.3 Nonlinear SVM Classification

Some datasets are not inherently linear, and one solution is to add features, such as polynomial features, and then use a linear SVM for training. This is similar to polynomial regression in 4.3.

5.3.1 Polynomial Kernel

Adding polynomial features is simple, but the degree is too low to fit complex functions, and the degree is too high to add a large number of features.

Fortunately, SVMs can use a mathematical method called the kernel trick. It behaves the same as adding a lot of polynomial features, but without actually adding features.

5.3.2 Adding Similarity Features

Another way to deal with nonlinear problems is to augment features using a similarity function, which calculates how similar all sample points are to a given sample point. For example, I think it is possible to define a Gaussian Radial Basis Function (RBF) with $\gamma = 0.3$ as a similarity function.

Gaussian RBF:

$\phi \gamma(X,l) = exp(-\gamma\left \| X - l \right \|^2)$

As for how to select a given sample point, a simple method is to use all samples in the training set as a given sample point, so that the new data set is as linearly separable as possible. But in this case, if the training set is large, it will add too many features.

5.3.3 Gaussian RBF Kernel

Similar to the polynomial kernel instead of directly adding polynomial features, we can also use a Gaussian RBF kernel instead of directly adding similarity features.

There are other kernel functions that are rarely used. For example, some kernel functions are specialized to deal with special data structures. String kernels can be used for text or DNA sequence classification (such as string subsequence kernels or kernels based on Levenshtein distance).

How to choose the kernel function? In general, linear classifiers should be tried first, especially if the training set is large or has many features. If the training set is not particularly large, you can also try the Gaussian RBF kernel, the graph is suitable for most cases.

5.3.4 Computational Complexity

LinearSVC is based onliblinear,它实现了线性SVMs的优化算法,但是不支持核技巧,计算复杂度大概$O(m \times n)$。

SVC基于libsvm,它实现了一个支持核技巧的算法,计算复杂度在$O(m^2 \times n)$到$O(m^3 \times n)$之间。

5.4 SVM regression

Different from the classification problem to find the maximum interval between categories, the purpose of SVM regression is to make the interval contain the most sample points. The width of the interval is controlled by the hyperparameter $\varepsilon$, as shown in the following figure:

 

5.5 Under the Hood

For some theoretical knowledge of svm, please refer to support vector machine .

5.5.6 Online SVMs

Let's find out more about this sometime.

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324512743&siteId=291194637
Recommended