[Machine Learning] Basic Understanding of Support Vector Machines and Detailed Practical Cases


foreword

This article will introduce in detail the principle of support vector machine in Python, detailed explanation of parameters, module usage and examples.
The principle part will only be briefly explained. There are many detailed principle tutorials on the Internet. You can search for it yourself. This article is mainly a practical operation.

1. What is a support vector machine?

Support Vector Machine (SVM) is a commonly used classification and regression algorithm, widely used in data mining, image recognition, natural language processing and other fields. Python is one of the most popular programming languages ​​at present, and its powerful scientific computing and machine learning libraries make Python an excellent implementation language for SVM.

2. Principle

The basic idea of ​​SVM is to find an optimal hyperplane to separate different categories of data. The hyperplane can be of any dimension, but in two-dimensional space, it is a straight line, which can be expressed by the following formula:

wTx+b=0

Among them, www is the normal vector,xxx is the data point,bbb is the offset.

In a binary classification problem, suppose there are two classes: 1 and -1. We want to find an optimal hyperplane such that all data points belonging to category 1 are on one side of the hyperplane and all data points belonging to category -1 are on the other side of the hyperplane. The distance between the hyperplane and the two closest data points is called the margin. The optimal hyperplane is the hyperplane with the largest margin.

The formula for calculating the interval is as follows:
insert image description here

Among them, xi x_ixiis the data point, ∣ ∣ w ∣ ∣ ||w||∣∣ w ∣∣ is the modulus length of the normal vector.

3. Parameter explanation

SVM has multiple parameters to tune, the following are common parameters:

C:惩罚参数,用于平衡模型的准确率和过拟合风险。C越大,模型的准确率越高,但也会增加过拟合的风险。默认值为1。

kernel:核函数,用于将数据从低维度空间映射到高维度空间。常见的核函数有线性核函数、多项式核函数和高斯核函数等。

gamma:核函数的系数,用于控制映射后的样本分布。gamma越大,映射后的分布越集中,决策边界也越复杂。

degree:多项式核函数的阶数,用于控制映射后的维度。degree越高,映射后的维度越高,模型也会更加复杂。

coef0:核函数的常数项,用于调整核函数的形状。

Here are some common SVM model return methods:

support_vectors_: 返回支持向量的数组。支持向量是在训练模型时最关键的数据点,它们决定了模型的决策边界。

coef_: 返回线性SVM模型的权重向量。权重向量可以用于分析数据集中哪些特征最相关,以及它们对模型的影响程度。

intercept_: 返回SVM模型的截距项。截距项是决策边界的偏移量,它在计算预测值时起到关键作用。

dual_coef_: 返回非线性SVM模型的拉格朗日乘子。拉格朗日乘子是非线性SVM算法中的重要参数,它用于计算核函数的权重。

n_support_: 返回每个类别的支持向量数量。这个值可以帮助我们评估模型的复杂度和泛化能力。

decision_function(X): 返回决策函数值,即SVM模型对输入数据集X的预测值。决策函数值可以用于评估模型的性能,以及生成ROC曲线和AUC值。

predict(X): 返回SVM模型对输入数据集X的分类预测结果。这个方法用于实际应用中对新数据进行分类。

score(X, y): 返回SVM模型在数据集(X, y)上的准确率。这个方法用于评估模型在训练集和测试集上的性能,以及确定模型的最佳参数设置。

4. Practical cases

1. Maximum edge separation hyperplane

In vector machines, the maximum margin separation hyperplane is a plane or hyperplane that can correctly separate data points of different categories and maximize the margin between different categories. This hyperplane plays a key role in classification problems because it can be used to make predictions for new data points, and it can also help us better understand the structure of the data.
On a two-dimensional plane, the maximum-margin separating hyperplane is a straight line. However, in high-dimensional spaces, the maximum-margin separating hyperplane is a hyperplane. For a binary classification problem, we can define the hyperplane equation as:

  • w^T x + b = 0

where w is a vector, x is the input feature vector, and b is the bias (also known as the intercept). All points satisfying this equation lie on the hyperplane, and w is the normal vector, which determines the direction of the hyperplane. The bias b determines the distance between the hyperplane and the origin.
During training, we want to find a hyperplane that correctly classifies the training data and maximizes the distance between different classes. This distance is the margin, which consists of the hyperplane and the data point closest to it. These data points are called support vectors because they play an important role in the construction and prediction of the model.
Because SVM is essentially an optimization problem, we need to define an objective function to minimize the error rate and maximize the margin. Usually we use soft-margin SVM to solve nonlinear separable problems, then the objective function will include a penalty term to allow some data points to lie inside the margin. The optimization problem of SVM can be solved by quadratic programming.
Ultimately, the SVM finds a maximum-margin separating hyperplane and determines the values ​​of w and b according to the characteristics of the training data. This hyperplane can be used to classify new data points and help us better understand the data structure.

  • Generate data
    Here we generate a binary classification data
from sklearn.datasets import make_blobs
from sklearn.svm import SVC
from sklearn.inspection import DecisionBoundaryDisplay
import matplotlib.pyplot as plt
import mplcyberpunk
plt.style.use('cyberpunk')
#%%
X,y=make_blobs(n_samples=100,centers=2,random_state=42)
plt.scatter(X[:,0],X[:,1],c=y)

insert image description here

  • Model training, constructing the maximum edge separation hyperplane
    Here we use 4 different (C, gamma) parameters to train the model and observe the different effects of the model
labeles=[
    (1000,0.01),(1,0.01),(1000,0.1),(1,0.1)
]

n=1
for c,gamra in labeles:
    ax=plt.subplot(220+n)
    n+=1
    clf=SVC(C=c,gamma=gamra,kernel='linear',random_state=42)
    clf.fit(X,y)
    plt.scatter(X[:,0],X[:,1],c=y)
    DecisionBoundaryDisplay.from_estimator(
    clf,
    X,
    plot_method="contour",
    colors="k",
    levels=[-1, 0, 1],
    alpha=0.5,
    linestyles=["--", "-", "--"],
    ax=ax,
)
    plt.scatter(clf.support_vectors_[:, 0],clf.support_vectors_[:, 1],
        s=100,
        linewidth=1,

        facecolors="none",
        edgecolors="r")

    plt.title(f'C:{
      
      c},gamma:{
      
      gamra}')

plt.show()

insert image description here
We can observe that the maximum edge separation hyperplane constructed by the support vector points divides the two into two categories, but because the data set is relatively simple, it is not easy to observe the effect of different parameters on the model, we can use the iris data set to show Take a look, here we only select two categories for simple calculations

from sklearn.datasets import load_iris
iris=load_iris()
X=iris.data[:,2:4]
y=iris.target
label=(y==0)|(y==1)
X=X[label]
y=y[label]



labeles=[
    (1000,0.01),(1,0.01),(1000,0.1),(1,0.1)
]

n=1
for c,gamra in labeles:
    ax=plt.subplot(220+n)
    n+=1
    clf=SVC(C=c,gamma=gamra,kernel='linear',random_state=42)
    clf.fit(X,y)
    plt.scatter(X[:,0],X[:,1],c=y)
    DecisionBoundaryDisplay.from_estimator(
    clf,
    X,
    plot_method="contour",
    colors="k",
    levels=[-1, 0, 1],
    alpha=0.5,
    linestyles=["--", "-", "--"],
    ax=ax,
)
    plt.scatter(clf.support_vectors_[:, 0],clf.support_vectors_[:, 1],
        s=100,
        linewidth=1,

        facecolors="none",
        edgecolors="r")

    plt.title(f'C:{
      
      c},gamma:{
      
      gamra}')

plt.show()

insert image description here
Through comparison, it is observed that when c is relatively large, the maximum edge separation hyperplane is relatively small. However, a larger C can improve the accuracy of the model, but it will also increase the risk of fitting. Here we introduce the concept of soft interval

2. Soft spacing

Soft Margin is an improved method in SVM which allows some noise or outliers to be tolerated to a certain extent. In traditional hard-margin SVM, if the dataset is not linearly separable, then the model will not be able to find a perfect decision boundary. Soft-margin SVM, on the other hand, allows some data points to cross to the other side of the decision boundary, thus better adapting to non-linearly separable datasets.

Soft margin SVM introduces a slack variable ξ \xiξ to achieve. In soft interval SVM, the objective function no longer requires the decision boundary to be able to completely separate positive and negative samples, but hopes to find an optimal decision boundary while tolerating a certain error. Specifically, the objective function can be expressed as:

min⁡w,b,ξ12∥w∥2+C∑i=1mξiw,b,ξmin​21​∥w∥2+C∑i=1m​ξi​

Among them, ξ i \xi_iXiIndicates the iiThe distance between i sample points and the correct classification decision boundary,CCC is a penalty factor that balances the weights for minimizing the margin and minimizing the error. WhenCCWhen C is larger, the model will pay more attention to minimizing the error, so as to obtain a more complex decision boundary; whenCCWhen C is small, the model pays more attention to minimizing the interval, resulting in simpler decision boundaries. Here the slack variableξ \xiThe introduction of ξ allows some sample points to be misclassified or located above or below the decision boundary, but their contribution to the objective function will be penalized by the termC ∑ i = 1 m ξ i C\sum_{i=1}^m\xi_iCi=1mXioffset by.

Soft-margin SVM can be adjusted by adjusting the penalty factor CCC to control the complexity and generalization ability of the model. WhenCCWhen the value of C is larger, the accuracy of the model may be improved, but the generalization ability may be affected; whenCCWhen the value of C is small, the generalization ability of the model will be better, but the accuracy rate may be reduced. Therefore, in practical applications, the penalty factorCCThe value of C.

In summary, Soft Margin SVM is a powerful machine learning algorithm that can achieve great results when dealing with non-linearly separable datasets. It balances the weights of minimizing intervals and minimizing errors by introducing slack variables and penalty terms, resulting in more robust decisions

To put it simply, the soft interval increases the complexity of the model's decision boundary in order to avoid the model's excessive pursuit of accuracy. Under normal circumstances, we generally choose a smaller C penalty term

3. Nonlinear classification problems and kernel functions

When faced with nonlinear classification problems, traditional support vector machine classifiers cannot handle them effectively. At this time, the kernel function can be used to map the nonlinear classification problem into a high-dimensional space, thereby performing linear classification in a high-dimensional space. Specifically, the kernel function maps the data points in the original space to a higher-dimensional feature space, so that the nonlinear problem in the original space becomes a linearly separable problem in the feature space. In the feature space, a linear classifier, such as an SVM classifier, can be used for classification.

The commonly used kernel functions are as follows:

  • Linear kernel function: K ( xi , xj ) = xi T xj K(x_i, x_j) = x_i^T x_jK(xi,xj)=xiTxj
  • Polynomial kernel function: K ( xi , xj ) = ( xi T xj + r ) d K(x_i, x_j) = (x_i^T x_j + r)^dK(xi,xj)=(xiTxj+r)d , where r is a constant and d is the order of the polynomial.
  • RBF kernel function: K ( xi , xj ) = e − γ ∣ ∣ xi − xj ∣ ∣ 2 K(x_i, x_j) = e^{-\gamma ||x_i - x_j||^2}K(xi,xj)=eγ∣∣xixj2 , whereγ \gammaγ is a positive constant,∣ ∣ xi − xj ∣ ∣ ||x_i - x_j||∣∣xixj∣∣ is the Euclidean distance.
  • Sigmoid核函数: K ( x i , x j ) = t a n h ( α x i T x j + c ) K(x_i, x_j) = tanh(\alpha x_i^T x_j + c) K(xi,xj)=t english ( α xiTxj+c ) , whereα \alphaalpha andccc is a constant.

These kernel functions can be used in the SVM classifier in the sklearn library. The constructor of the SVM classifier has a kernel parameter that can be used to specify the type of kernel function to use. For example, the RBF kernel function can be used to construct a nonlinear classifier:

from sklearn.svm import SVC

clf = SVC(kernel='rbf')

It should be noted that using the kernel function will increase the computational complexity, because the dimensionality of the data points will become very high after being mapped to the high-dimensional feature space. Therefore, in practical applications, it is necessary to select appropriate kernel functions and parameters according to specific situations to balance computational complexity and classification accuracy.

Introduce an example of using the Gaussian kernel function

import numpy as np
import matplotlib.pyplot as plt
from sklearn import svm

xx, yy = np.meshgrid(np.linspace(-3, 3, 500), np.linspace(-3, 3, 500))
np.random.seed(0)
X = np.random.randn(300, 2)
Y = np.logical_xor(X[:, 0] > 0, X[:, 1] > 0)

# fit the model
clf = SVC(gamma="auto",kernel='rbf')
clf.fit(X, Y)

# plot the decision function for each datapoint on the grid
Z = clf.decision_function(np.c_[xx.ravel(), yy.ravel()])
Z = Z.reshape(xx.shape)

plt.imshow(
    Z,
    interpolation="nearest",
    extent=(xx.min(), xx.max(), yy.min(), yy.max()),
    aspect="auto",
    origin="lower",
    cmap=plt.cm.PuOr_r,
)
contours = plt.contour(xx, yy, Z, levels=[0], linewidths=2, linestyles="dashed")
plt.scatter(X[:, 0], X[:, 1], s=30, c=Y, cmap=plt.cm.Paired, edgecolors="k")
plt.xticks(())
plt.yticks(())
plt.axis([-3, 3, -3, 3])
plt.show()

insert image description here
Perform binary classification using nonlinear SVC with RBF kernel. The target to predict is to XOR the inputs. Color depth refers to the decision score, the closer to the center, the darker the color

4. Effects of different kernel functions

  • Dataset loading
from sklearn import svm
from sklearn.inspection import DecisionBoundaryDisplay
iris=load_iris()
X=iris.data[:,2:4]
y=iris.target

  • Model training construction
    Here we use different kernel function models:
  • svm.SVC(kernel="linear", C=C): Linear SVM model, using linear kernel function.
  • svm.LinearSVC(C=C, max_iter=10000): Linear SVM model, also uses linear kernel function. Unlike svm.SVC, LinearSVC is implemented based on the liblinear library, which uses a different optimization algorithm to converge faster. max_iter is the maximum number of iterations, used to specify the stopping condition of the algorithm.
  • svm.SVC(kernel="rbf", gamma=0.7, C=C): nonlinear SVM model, using radial basis kernel function (RBF). gamma is the coefficient of the kernel function, which is used to control the shape of the decision boundary.
  • svm.SVC(kernel="poly", degree=3, gamma="auto", C=C): nonlinear SVM model, using polynomial kernel function. degree is the order of the polynomial, gamma is the coefficient of the kernel function,
models = (
    svm.SVC(kernel="linear", C=C),
    svm.LinearSVC(C=C, max_iter=10000),
    svm.SVC(kernel="rbf", gamma=0.7, C=C),
    svm.SVC(kernel="poly", degree=3, gamma="auto", C=C),
)
models = (clf.fit(X, y) for clf in models)
  • Construct Decision Boundary Visualization

# title for the plots
titles = (
    "SVC with linear kernel",
    "LinearSVC (linear kernel)",
    "SVC with RBF kernel",
    "SVC with polynomial (degree 3) kernel",
)

# Set-up 2x2 grid for plotting.
fig, sub = plt.subplots(2, 2)
plt.subplots_adjust(wspace=0.4, hspace=0.4)

X0, X1 = X[:, 0], X[:, 1]


for clf, title, ax in zip(models, titles, sub.flatten()):
    disp = DecisionBoundaryDisplay.from_estimator(
        clf,
        X,
        response_method="predict",
        cmap=plt.cm.coolwarm,
        alpha=0.8,
        ax=ax,
        xlabel=iris.feature_names[0],
        ylabel=iris.feature_names[1],
    )
    ax.scatter(X0, X1, c=y, cmap=plt.cm.coolwarm, s=20, edgecolors="k")
    ax.set_xticks(())
    ax.set_yticks(())
    ax.set_title(title)

plt.show()

insert image description here
It can be found that the decision boundary of the linear kernel function is a broken line, which is suitable for linear classification problems, while the rbf and poly kernel functions have relatively smooth decision lines and are suitable for nonlinear data.

5. Regression problem

In addition to being used for classification problems, vector machines can also be applied to regression problems. In regression problems, we wish to predict a continuous output value based on a set of input features. Unlike classification problems, regression problems need to predict continuous values, so we need another way to build models.

In support vector machine regression, we first determine a boundary (or called pipeline), which consists of two parallel hyperplanes. These planes are placed furthest from the data points, which are called support vectors. This boundary is called the ε-pipe, where ε is the width of the pipe. Any data point in the pipeline is considered a correct prediction, while points outside the pipeline are considered incorrect predictions.

Similar to the classification problem, our goal is to minimize the error while maintaining the largest margin. Error is defined as the difference between the actual output value and the value predicted by the model. In regression problems, we need to find a function f(x) that maps an input x to an output y.

Unlike classification problems, we can use different loss functions in regression problems such as squared error, absolute error or Huber loss. These loss functions can be chosen according to the specific problem.

In support vector machine regression, we use kernel functions to extend the model to handle nonlinear regression problems. RBFs and polynomial kernels can also be used in regression problems. We can use the SVR (Support Vector Machine Regression) model in the Scikit-learn library to complete the modeling and prediction of regression problems during implementation.

  • build dataset
from sklearn.datasets import make_blobs
from sklearn.svm import SVC
from sklearn.inspection import DecisionBoundaryDisplay
import matplotlib.pyplot as plt
import mplcyberpunk
plt.style.use('cyberpunk')
import numpy as np


X = np.sort(5 * np.random.rand(100, 1), axis=0)
y = np.sin(X).ravel()

#增添点噪音
y[::10] += 3 * (0.5 - np.random.rand(10))


plt.scatter(X,y,alpha=0.8)

insert image description here

  • Model construction
    Here we still build different kernel function models to observe the effect, and
    build rbf, linear and poly kernel function models respectively
from sklearn.svm import SVR
svr_rbf=SVR(kernel='rbf',C=1,gamma=0.1,epsilon=0.1)
svr_lin = SVR(kernel="linear", C=1, gamma="auto")
svr_poly = SVR(kernel="poly", C=1, gamma="auto", degree=3, epsilon=0.1, coef0=1)
  • Model Training Visualization Comparison
lw = 2

svrs = [svr_rbf, svr_lin, svr_poly]
kernel_label = ["RBF", "Linear", "Polynomial"]
model_color = ["m", "c", "g"]

fig, axes = plt.subplots(nrows=1, ncols=3, figsize=(8, 6), sharey=True)
for ix, svr in enumerate(svrs):
    axes[ix].plot(
        X,
        svr.fit(X, y).predict(X),
        color=model_color[ix],
        lw=lw,
        label="{} model".format(kernel_label[ix]),
    )
    axes[ix].scatter(
        X[svr.support_],
        y[svr.support_],
        facecolor="none",
        edgecolor=model_color[ix],
        s=50,
        label="{} support vectors".format(kernel_label[ix]),
    )
    axes[ix].scatter(
        X[np.setdiff1d(np.arange(len(X)), svr.support_)],
        y[np.setdiff1d(np.arange(len(X)), svr.support_)],
        facecolor="none",
        edgecolor="k",
        s=50,
        label="other training data",
    )
    axes[ix].legend(
        loc="upper center",
        bbox_to_anchor=(0.5, 1.1),
        ncol=1,
        fancybox=True,
        shadow=True,
    )

fig.text(0.5, 0.04, "data", ha="center", va="center")
fig.text(0.06, 0.5, "target", ha="center", va="center", rotation="vertical")
fig.suptitle("Support Vector Regression", fontsize=14)
plt.show()

insert image description here
It can be found that choosing RBF and poly kernel functions in regression problems can fit nonlinear functions well, while linear kernel functions are more dominant in linear data

Summarize,

This article introduces the principle of SVM, detailed explanation of parameters, the use of Python modules and examples. SVM is a commonly used supervised learning algorithm that can be used for both classification and regression problems. By adjusting the parameters, the accuracy and generalization ability of the model can be optimized. In Python, the SVM algorithm can be implemented using the sklearn.svm and libsvm modules.

  • Any mistakes above are welcome to correct
    I hope you will support and share more interesting things in the future

Guess you like

Origin blog.csdn.net/qq_61260911/article/details/130565104