3. Support Vector Machine Algorithm (SVC, Support Vector Classification) (supervised learning)

Support Vector Machine is the so-called SVM. It refers to a series of machine learning algorithms. According to the different problems solved, it is divided into SVC (classification) and SVR (regression).

SVC, Support Vector Classification, is essentially a support vector machinesupport vector, but it is only used for classificationclassificationTask
SVR, Support Vector Regression, its essence is also a support vector machinesupport vector , only for regressionRegressiontask
This column is mainly a summary of classification algorithms. Mainly introduce SVC

C-Support Vector Classification (C-Support Vector Classifier) is implemented based on libsvm. The fitting time has at least a quadratic relationship with the number of samples, and may not be feasible beyond tens of thousands of samples. For large data sets, consider using LinearSVC or SGDClassifier, possibly after using the Nystroem transformer or other kernel approximation methods.

1. Algorithm idea

Take the two-classification task as an example. There are two different types of sample data. A new sample is received and classified. Here are some core concepts, described in my own vernacular.

Find a line that can divide the sample data of these two categories. This line is called the decision boundary
Take the decision boundary as the center and translate to both sides. Until the sample point is touched, the interval between the two lines is called margin (interval); the sample point touched is called the support vector
and finally becomes< a i=3>Find the largest marginProblem

Decision boundary: Find a line (two-dimensional) that can divide the two categories of data, or a hyperplane (three-dimensional), etc.
Interval: Center the decision boundary Translate on both sides until it hits the nearest sample point. The distance between these two lines is margin (interval)
Support vector: Translate to both sides with the decision boundary as the center. These two lines The sample points encountered are called support vectors
Hard interval: rigid interval, the interval obtained by strictly following the algorithm solution idea
Soft interval: flexible interval, if there is an outlier or noise point, it can be restricted based on regularization to get a consistent

The blue noise here should belong to the pentagon category, but it is a five-pointed star category in the data set sample. This sample is a noise point, which will affect the normal classification and needs to be eliminated. At this time, a soft interval is introduced to reduce the The impact of noise.
Insert image description here

2. Official website API

Official website API

class sklearn.svm.SVC(*, C=1.0, kernel='rbf', degree=3, gamma='scale', coef0=0.0, shrinking=True, probability=False, tol=0.001, cache_size=200, class_weight=None, verbose=False, max_iter=-1, decision_function_shape='ovr', break_ties=False, random_state=None)

There are quite a lot of parameters here. For specific parameter usage, you can learn based on the demo provided on the official website and try it more; here are some commonly used parameters for explanation.
Guide package:from sklearn.svm import SVC

①Regularization parameter C

The strength of the regularization is inversely proportional to the regularization parameter C. The penalty is the square of the L2 regularization. C is a floating point number type.