Data Mining Learning - Support Vector Machine (SVM)

Table of contents

1. Introduction

(1) Linearly separable support vector machine                                                                       

   1. Original question:

   2.SVM

   3. Classification prediction reliability

   4. Class Intervals

   5. Constraints

   6. Learning algorithm of linearly separable support vector machine (maximum interval method)

   7. Dual algorithm

(2) Linear non-separable support vector machine

Algorithm process

(3) Nonlinear support vector machine

1. Dual problem

2. Algorithms

2. Actual combat (rbf+gamma for iris classification)


1. Introduction

  SVM is a classification model , which is a linear classifier defined with the largest interval (distance) in the feature space .

  Basic idea: SVM represents the training sample data set as points in the feature space, and separates the training data of each category using a hyperplane. When predicting, input a new test data point. If the test data point is in the position of the feature space If it is distributed on one side of the hyperplane, then it is judged that the category of the test point is the category corresponding to the side .

There are three types of SVMs:

Linearly Separable Support Vector Machines (Hard Margin Maximization)

Linear non-separable support vector machines (soft margin maximization)

Nonlinear support vector machines (kernel tricks and soft margin maximization)

(1) Linearly separable support vector machine                                                                       

 1. Original question :

The optimization problem for solving a linearly separable support vector machine is taken as the original optimization problem. 

(SVM is usually used for binary classification problems, using -1 and +1 to represent the corresponding two categories, when yi=-1, the sample point xi is called a negative example, when yi=+1, the sample point xi is called a positive example )

   2.SVM

    When the training data set is linearly separable, the SVM algorithm expects to be able to calculate a separating hyperplane in the feature space of the sample data distribution, so that all samples (positive and negative examples) can be distributed according to their corresponding categories. Both sides of the hyperplane.

The classification decision function f(x)    of linearly separable support vector machine :

The separating hyperplane    learned by margin maximization or equivalently solving the corresponding convex quadratic programming problem is:

  3. Classification prediction reliability

   When using SVM to classify training samples, classification prediction reliability is usually used to evaluate the reliability of non-standard algorithms.

  The closer the distance to the separating hyperplane, the less reliable the classification of the data is, and conversely, the farther the distance is, the more reliable the classification of the data is .

  If the arithmetic sign of w·xi+b is consistent with the sign of the classification label yi of the sample data point, it means that the classification is correct, otherwise, the classification is wrong .

  4. Class Intervals

Define the geometric interval between the training data set T and the hyperplane as:

The goal of the support vector machine is to find a separating hyperplane that can correctly divide the training sample data set and maximize its geometric interval.

 (In the figure, the point through which the dotted line passes is the support vector, and the vertical distance from the support vector to the solid line is the classification interval γ)

5. Constraints

Two questions:

(1) How to judge whether the separating hyperplane correctly classifies the sample points

(2) To solve the classification interval d, you need to find the support vector first, so how to find the support vector among many training samples?

These two problems are to solve the constraints of the optimal classification interval, that is, to solve the constraints and limitations of the optimal separating hyperplane.

6. Learning algorithm of linearly separable support vector machine (maximum interval method)

Input: Input the linearly separable sample data set T to be trained.

Output: Output the separating hyperplane and classification decision function of the maximum margin.

Algorithm process:

(1) Solving optimization problems under constraints and constraints

(2) Find the optimal solution (w*, b*)

(3) Get the optimal separation hyperplane w* x+b*=0 and its classification decision function f(x)=sign(w* x+b*)

7. Dual algorithm

The main purpose of adopting the Lagrange equation is to put the constraints into the objective function, thus transforming the constrained optimization problem into an unconstrained optimization problem of the new objective function.

Through the Lagrangian duality, the original problem is transformed into the problem of finding the maximum and minimum.

(2) Linear non-separable support vector machine

Algorithm process:

Input: Input the sample data set T to be trained and the penalty parameter C

Output: Output the separating hyperplane and classification decision function that maximizes the soft margin

(1) Solving constrained optimization problems

(2) calculation

At the same time, select a certain component aj* of a*, and calculate

(3) Obtain the separating hyperplane and classification decision function for maximizing the soft interval

(3) Nonlinear support vector machine

1. Dual problem

The kernel function maps any two vectors x and z in the input space to the inner product between the corresponding vectors in the feature space. This way of replacing the inner product with a kernel function is called the kernel trick.

In nonlinear support vector machines, commonly used kernel functions:

(1) Polynomial kernel function

(2) Gauss kernel function

(3) sigmoid kernel function

2. Algorithms

Input: Input training data set T and penalty parameter C>0

Output: output classification decision parameters

Algorithm process:

(1) Select an appropriate kernel function K(x,z) to solve the constrained optimization problem

(2) calculation

At the same time, select a certain component aj* of a*, and calculate

 (3) The classification decision function is

2. Actual combat (rbf+gamma for iris classification)

code:

from sklearn import datasets
import numpy as np
from sklearn.model_selection import train_test_split
# 划分数据集为训练集和测试机
from sklearn import svm # 导入SVM

iris=datasets.load_iris()
# 导入鸢尾花数据集
data_train,data_test,target_train,target_test=train_test_split(iris.data,iris.target,test_size=0.3)
# 测试集占总数据集的0.3
svm_classifier=svm.SVC(C=1.0,kernel='rbf',
                       decision_function_shape='ovr',gamma=0.01)
# 定义一个svm对象
svm_classifier.fit(data_train,target_train)# 训练模型

score=svm_classifier.score(data_test,target_test)
# 把测试集的数据传入即可得到模型的评分
predict=svm_classifier.predict([[0.1,0.2,0.3,0.4]])
# 预测给定样本数据对应的标签



print(score)
print(predict)

Parameter explanation:

   The larger the parameter c , the smaller the error for the training set, but it is easier to overfit.

   The parameter coef0 is small to prevent overfitting; coef0 is large to prevent underfitting.

   The larger the gamma , the smaller the area affected by the support vector, the higher the complexity of the model, and it is easy to overfit; the smaller the gamma, the smoother the decision boundary, the lower the complexity of the model, and it is easy to underfit.

   Kernel parameters : 'linear' linear kernel function

                        'poly' polynomial kernel function

                        'rbf' Radial Basis Kernel Function

                        'sigmoid' sigmoid kernel function

operation result:

Guess you like

Origin blog.csdn.net/weixin_52135595/article/details/126918276