Traditional machine learning (seven) support vector machine (2) svm in sklearn

Traditional machine learning (seven) support vector machine (2) svm in sklearn

2 svm in sklearn

2.1 Detailed explanation of LinearSVC and SVC parameters

2.1.1 SVC parameters

class sklearn.svm.SVC(*,
                      C=1.0, 
                      kernel='rbf', 
                      degree=3, 
                      gamma='scale', 
                      coef0=0.0, 
                      shrinking=True, 
                      probability=False, 
                      tol=0.001, 
                      cache_size=200, 
                      class_weight=None, 
                      verbose=False, 
                      max_iter=-1, 
                      decision_function_shape='ovr', 
                      break_ties=False, 
                      random_state=None
                     )

Used for classification, implemented with libsvm , the parameters are as follows:

  • C : Penalty item, the default is 1.0, the larger C is, the smaller the error tolerance space is; the smaller C is, the larger the error tolerance space is
  • kernel : the type of kernel function, the optional parameters are:
    • "linear" : linear kernel function
    • "poly" : polynomial kernel function
    • "rbf" : Gaussian kernel function (default)
    • "sigmod" : sigmod kernel function
    • "precomputed": kernel matrix, which means that the kernel function matrix has been calculated in advance
  • degree : the degree of the polynomial kernel function, the default is 3, only valid for polynomial kernel functions
  • gamma : Kernel function coefficient, optional, float type, default is auto. Only valid for 'rbf', 'poly', 'sigmod'. If gamma is auto, it means that its value is the reciprocal of the sample feature number, ie1/n_features
  • coef0 : independent item in the kernel function, float type, optional, default is 0.0. Only useful for 'poly' and 'sigmod' kernel functions, refers to the parameter c
  • probability : Whether to enable probability estimation, bool type, optional parameter, the default is False, which must be enabled before calling fit(), and will slow down the fit() method
  • tol : the error precision of svm stop training, float type, optional parameter, the default is 1e^-3
  • cache_size : memory size, float, optional, default 200. Specifies the memory required for training, in MB
  • class_weight: class weight, dict type or str type, optional, default None. Set a different penalty parameter C for each category. If not, all categories will be given C=1, which is the C indicated above. If the parameter 'balance' is given, the value of y is used to automatically adjust the weights inversely proportional to the frequency of classes in the input data
  • max_iter: the maximum number of iterations, int type, the default is -1, no limit
  • decision_function_shape : decision function type, optional parameters 'ovo' and 'ovr', default is 'ovr'. 'ovo' means one vs one, 'ovr' means one vs rest. (multiple categories)
  • random_state : the seed value when the data is shuffled, int type, optional, the default is None

2.1.2 LinearSVC parameters

class sklearn.svm.LinearSVC(
    penalty='l2', 
    loss='squared_hinge', 
    *, 
    dual=True, 
    tol=0.0001, 
    C=1.0, 
    multi_class='ovr', 
    fit_intercept=True, 
    intercept_scaling=1, 
    class_weight=None, 
    verbose=0, 
    random_state=None, 
    max_iter=1000
)

LinearSVC (Linear Support Vector Classification) linear support vector machine , the kernel function linearis not based on libsvm implementation
parameters:

  • C: penalty coefficient C of the objective function, default C = 1.0;
  • loss: Specify the loss function. squared_hinge(默认),squared_hinge
  • penalty : penalty method, str type, l1,l2
  • dual : Selects the algorithm to solve the dual or primal optimization problem. nsamples>nfeaturesat that timedual=false
  • tol : the precision of the svm end standard, the default is1e - 3
  • multi_class: If the y output category contains multiple classes, it is used to determine a multi-class strategy, ovr means one-to-many, and "crammer_singer" optimizes a common goal for all classes. If "crammer_singer" is selected, losses, penalties and optimizations will be ignored.
  • max_iter : The maximum number of iterations to run. int, default 1000

2.2 LinearSVC and SVC use cases

2.2.1 Application of iris data set on linear SVM

There are two ways to implement linear SVM:

  • LinearSVC(C=1)
  • SVC(C=1, kernel="linear")
import numpy as np
import os
%matplotlib inline
import matplotlib
import matplotlib.pyplot as plt
plt.rcParams['axes.labelsize'] = 14
plt.rcParams['xtick.labelsize'] = 12
plt.rcParams['ytick.labelsize'] = 12
import warnings
warnings.filterwarnings('ignore')


from sklearn.svm import SVC
from sklearn.datasets import load_iris

# 选择鸢尾花数据集的两列,方便画图
iris = load_iris()
X = iris['data'][:, (2,3)]
y = iris['target']
# 只选择两个种类的鸢尾花数据
setosa_versi = (y == 0) | (y == 1)
X = X[setosa_versi]
y = y[setosa_versi]


# 训练,常数C越大,容错空间越小,上下支持面之间的距离越小;常数C越小,容错空间越大,上下支持面之间的距离越大
svm_clf = SVC(kernel='linear',C=100000000)
svm_clf.fit(X,y)
# 绘制决策边界
def plot_svc_decision_boundary(svm_clf,xmin,xmax,sv=True):
    # linearsvc.coef_是所有特征的权重
    w = svm_clf.coef_[0]
    # linearsvc.intercept_是截距
    b = svm_clf.intercept_[0]
    
    x0 = np.linspace(xmin, xmax, 200)
    # 决策边界方程为:w[0]x0 + w[1]x1 = 0
    # 这里计算x1的值
    decision_boundary = - w[0]/w[1] * x0 - b/w[1]
    # 支持面距离判别面之间的距离margin
    margin = 1 / w[1]
    # 上支持面,在这里是x1 + margin
    gutter_up = decision_boundary + margin
    # 下支持面,在这里是x1 - margin
    gutter_dw = decision_boundary - margin
    
    # 是否画出支持向量
    if sv:
        svs = svm_clf.support_vectors_  # 支持向量
        plt.scatter(svs[:,0],svs[:,1],s=180)
    plt.plot(x0,decision_boundary,'k-',linewidth=2)
    plt.plot(x0,gutter_up,'k--',linewidth=2)
    plt.plot(x0,gutter_dw,'k--',linewidth=2)

plt.figure(figsize=(12,8))
plot_svc_decision_boundary(svm_clf,0,5.5)
plt.plot(X[:,0][y==1],X[:,1][y==1],'bs')
plt.plot(X[:,0][y==0],X[:,1][y==0],'yo')
plt.axis([0,5.5,0,2])
plt.show()

insert image description here

# 在这里把C值调小一些,我们可以看到容错空间越大,上下支持面之间的距离越大
# 此时的支撑向量:指的是落在支持面上的样本,及支持面没支持住的样本
svm_clf_2 = SVC(kernel='linear',C=0.01)
svm_clf_2.fit(X,y)


plt.figure(figsize=(12,8))
plot_svc_decision_boundary(svm_clf_2,0,5.5)
plt.plot(X[:,0][y==1],X[:,1][y==1],'bs')
plt.plot(X[:,0][y==0],X[:,1][y==0],'yo')
plt.axis([0,5.5,0,2])
plt.show()

insert image description here

2.2.2 Polynomial kernel function

Here you can manually use PolynomialFeatureto upscale the data, and then use LinearSVC to classify. The polynomial kernel function can also be specified directly using SVC, namely:

1. Use LinearSVC, need to use PolynomialFeaturesdimension upgrade

from sklearn.datasets import make_moons

# 使用sklearn自带的moon数据
X, y = make_moons(n_samples=100,noise=0.15,random_state=42)

def plot_dataset(X,y,axis):
    plt.plot(X[:,0][y == 0],X[:,1][y == 0],'bs')
    plt.plot(X[:,0][y == 1],X[:,1][y == 1],'go')
    plt.axis(axis)
    plt.grid(True,which='both')


plot_dataset(X,y,[-1.5,2.5,-1,1.5])
plt.show()

insert image description here

from sklearn.preprocessing import PolynomialFeatures

# 使用LinearSVC分类,使用pipeline封装一下
poly_pipeline = Pipeline(steps=[
    ("ploy_features",PolynomialFeatures(degree=3)),
    ("scaler",StandardScaler()),
    ("svm_clf",LinearSVC(C=10,loss='hinge'))
])

poly_pipeline.fit(X,y)
# 画出决策边界
def plot_pred(clf,axes):
    x0s = np.linspace(axes[0],axes[1],100)
    x1s = np.linspace(axes[2],axes[3],100)
    x0,x1 = np.meshgrid(x0s,x1s)
    # x0 和 x1 被拉成一列,然后拼接成10000行2列的矩阵,表示所有点
    X = np.c_[x0.ravel(),x1.ravel()]
    # 二维点集才可以用来预测
    y_pred = clf.predict(X).reshape(x0.shape)
    # 等高线
    plt.contourf(x0,x1,y_pred,alpha=0.2)

plot_pred(poly_pipeline,[-1.5,2.5,-1,1.5])
plot_dataset(X,y,[-1.5,2.5,-1,1.5])
plt.show()

insert image description here

2. SVC()Specify withkernel=poly

ploy_kernel_svm_clf1 = Pipeline(
    steps=[
        ("scaler",StandardScaler()),
        ("svm_clf",SVC(kernel='poly',degree=3,coef0=1,C=5))
    ]
)

ploy_kernel_svm_clf1.fit(X,y)


# 可以看到degree越大,分类效果越好,但是也容易造成模型过拟合、模型复杂度提高
ploy_kernel_svm_clf2 = Pipeline(
    steps=[
        ("scaler",StandardScaler()),
        ("svm_clf",SVC(kernel='poly',degree=10,coef0=100,C=5))
    ]
)

ploy_kernel_svm_clf2.fit(X,y)
plt.figure(figsize=(12,5))
plt.subplot(121)
plot_pred(ploy_kernel_svm_clf1,[-1.5,2.5,-1,1.5])
plot_dataset(X,y,[-1.5,2.5,-1,1.5])

plt.subplot(122)
plot_pred(ploy_kernel_svm_clf2,[-1.5,2.5,-1,1.5])
plot_dataset(X,y,[-1.5,2.5,-1,1.5])
plt.show()

insert image description here

2.2.3 Gaussian kernel function

insert image description here

insert image description here

gamma1,gamma2 = 0.1,5
C1,C2 = 0.001,1000

params = (gamma1,C1),(gamma1,C2),(gamma2,C1),(gamma2,C2)

svm_clfs = []

for gamma,C in params:
    rbf_pipeline = Pipeline(
        steps=[
             ("scaler",StandardScaler()),
             ("svm_clf",SVC(kernel='rbf',gamma=gamma,C=C))
        ]
    )
    rbf_pipeline.fit(X,y)
    svm_clfs.append(rbf_pipeline)
plt.figure(figsize=(11,7))
for  i,svm_clf in enumerate(svm_clfs):
    plt.subplot(221+i)
    plot_pred(svm_clf,[-1.5,2.5,-1,1.5])
    plot_dataset(X,y,[-1.5,2.5,-1,1.5])
    gamma,C = params[i]
    plt.title("gamma = {},C = {}".format(gamma,C))

plt.show()

  • The larger the gamma parameter, the narrower the Gaussian distribution

  • The larger the constant C, the smaller the error tolerance space

insert image description here

2.3 SVM applied to image recognition case

What we're going to try to do is, given an image of a face, predict which people in a list it might belong to.

The learning set will be a set of labeled images of faces, and we will try to learn a model that can predict labels for unseen instances.
The first intuitive approach is to use image pixels as features for the learning algorithm, so the pixel values ​​will be our learned attributes, and the labels of the individuals will be our target classes.

1. Data Exploration

import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import fetch_olivetti_faces

%matplotlib inline
plt.rcParams['axes.labelsize'] = 14
plt.rcParams['xtick.labelsize'] = 12
plt.rcParams['ytick.labelsize'] = 12
import warnings
warnings.filterwarnings('ignore')


'''
该数据集包含 40 个不同人脸的400张灰度的64*64像素的人脸图像。
拍摄的照片采用不同的光线条件和面部表情(包括睁眼/闭眼,微笑/不笑,戴眼镜/不戴眼镜)

提供一个百度网盘的连接,可以手动下载
链接: https://pan.baidu.com/s/1ofeOk6hE4tMA-AYPJzW7cg
提取码:1111

将人脸图像数据集olivetti_py3.pkz放入到data目录下
'''

faces = fetch_olivetti_faces(data_home='./data/')
# 打印数据集信息
print(faces.DESCR)
.. _olivetti_faces_dataset:

The Olivetti faces dataset
--------------------------

`This dataset contains a set of face images`_ taken between April 1992 and 
April 1994 at AT&T Laboratories Cambridge. The
:func:`sklearn.datasets.fetch_olivetti_faces` function is the data
fetching / caching function that downloads the data
archive from AT&T.

.. _This dataset contains a set of face images: https://cam-orl.co.uk/facedatabase.html

As described on the original website:

    There are ten different images of each of 40 distinct subjects. For some
    subjects, the images were taken at different times, varying the lighting,
    facial expressions (open / closed eyes, smiling / not smiling) and facial
    details (glasses / no glasses). All the images were taken against a dark
    homogeneous background with the subjects in an upright, frontal position 
    (with tolerance for some side movement).

**Data Set Characteristics:**

    =================   =====================
    Classes                                40
    Samples total                         400
    Dimensionality                       4096
    Features            real, between 0 and 1
    =================   =====================

The image is quantized to 256 grey levels and stored as unsigned 8-bit 
integers; the loader will convert these to floating point values on the 
interval [0, 1], which are easier to work with for many algorithms.

The "target" for this database is an integer from 0 to 39 indicating the
identity of the person pictured; however, with only 10 examples per class, this
relatively small dataset is more interesting from an unsupervised or
semi-supervised perspective.

The original dataset consisted of 92 x 112, while the version available here
consists of 64x64 images.

When using these images, please give credit to AT&T Laboratories Cambridge.

'''
查看faces对象的内容,我们得到以下属性:images,data和target。
images包含表示为64 x 64像素矩阵的 400 个图像。
data包含相同的 400 个图像,但是作为 4096 个像素的数组。
target是一个具有目标类的数组,范围从 0 到 39。
'''

print(faces.keys())
print(faces.images.shape)
print(faces.data.shape)
print(faces.target.shape)
dict_keys(['data', 'images', 'target', 'DESCR'])
(400, 64, 64)
(400, 4096)
(400,)
'''
规范化数据非常重要。对于 SVM 的应用来说,获得良好结果也很重要

可以看到图像已经成为 0 到 1 之间的非常均匀范围内的值(像素值)
'''
print(np.max(faces.data))
print(np.min(faces.data))
print(np.mean(faces.data))
1.0
0.0
0.5470426
'''
绘制图像
'''
def print_faces(images, target, top_n):
    # set up the figure size in inches
    fig = plt.figure(figsize=(12, 12))
    fig.subplots_adjust(left=0, right=1, bottom=0, top=1, hspace=0.05, wspace=0.05)
    for i in range(top_n):
        # plot the images in a matrix of 10x10
        p = fig.add_subplot(10, 10, i + 1, xticks=[], yticks=[])
        p.imshow(images[i], cmap=plt.cm.bone)

        # label the image with the target value
        p.text(0, 14, str(target[i]),color='red')
        p.text(0, 60, str(i),color='red')


# 打印前 20 张图像,可以看到是两个人脸
print_faces(faces.images, faces.target, 20)

insert image description here

2. Using svm for image recognition

from sklearn.svm import SVC

'''
SVC 实现具有不同的重要参数;可能最相关的是kernel,它定义了在我们的分类器中使用的核函数(将核函数看作实例之间的不同相似性度量)。

默认情况下,SVC类使用rbf核,这允许我们模拟非线性问题。首先,我们将使用最简单的核,即linear。
'''
svc_1 = SVC(kernel='linear')


# 数据集分成训练和测试数据集
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(faces.data, faces.target, test_size=0.25, random_state=0)


# 评估 K 折交叉验证
from sklearn.model_selection import cross_val_score, KFold
from scipy.stats import sem

def evaluate_cross_validation(clf, X, y, K):
    # create a k-fold croos validation iterator
    cv = KFold(n_splits=K, shuffle=True, random_state=0)
    # by default the score used is the one returned by score method of the estimator (accuracy)
    scores = cross_val_score(clf, X, y, cv=cv)
    print(scores)
    print("Mean score: {0:.3f} (+/-{1:.3f})".format(np.mean(scores), sem(scores)))

# 交叉验证五次,获得了相当不错的结果(准确率为 0.913)。
evaluate_cross_validation(svc_1, X_train, y_train, 5)
[0.93333333 0.86666667 0.91666667 0.93333333 0.91666667]
Mean score: 0.913 (+/-0.012)
# 定义一个函数来对训练集进行训练并评估测试集上的表现
from sklearn import metrics

def train_and_evaluate(clf, X_train, X_test, y_train, y_test):

    clf.fit(X_train, y_train)

    print("Accuracy on training set:")
    print(clf.score(X_train, y_train))
    print("Accuracy on testing set:")
    print(clf.score(X_test, y_test))

    y_pred = clf.predict(X_test)

    print("Classification Report:")
    print(metrics.classification_report(y_test, y_pred))
    print("Confusion Matrix:")
    print(metrics.confusion_matrix(y_test, y_pred))

# 分类器执行操作并几乎没有错误
train_and_evaluate(svc_1, X_train, X_test, y_train, y_test)
Accuracy on training set:
1.0
Accuracy on testing set:
0.99
Classification Report:
              precision    recall  f1-score   support

           0       0.86      1.00      0.92         6
           1       1.00      1.00      1.00         4
           2       1.00      1.00      1.00         2
           3       1.00      1.00      1.00         1
           4       1.00      1.00      1.00         1
           5       1.00      1.00      1.00         5
           6       1.00      1.00      1.00         4
           7       1.00      0.67      0.80         3
           9       1.00      1.00      1.00         1
          10       1.00      1.00      1.00         4
          11       1.00      1.00      1.00         1
          12       1.00      1.00      1.00         2
          13       1.00      1.00      1.00         3
          14       1.00      1.00      1.00         5
          15       1.00      1.00      1.00         3
          17       1.00      1.00      1.00         6
          19       1.00      1.00      1.00         4
          20       1.00      1.00      1.00         1
          21       1.00      1.00      1.00         1
          22       1.00      1.00      1.00         2
          23       1.00      1.00      1.00         1
          24       1.00      1.00      1.00         2
          25       1.00      1.00      1.00         2
          26       1.00      1.00      1.00         4
          27       1.00      1.00      1.00         1
          28       1.00      1.00      1.00         2
          29       1.00      1.00      1.00         3
          30       1.00      1.00      1.00         4
          31       1.00      1.00      1.00         3
          32       1.00      1.00      1.00         3
          33       1.00      1.00      1.00         2
          34       1.00      1.00      1.00         3
          35       1.00      1.00      1.00         1
          36       1.00      1.00      1.00         3
          37       1.00      1.00      1.00         3
          38       1.00      1.00      1.00         1
          39       1.00      1.00      1.00         3

    accuracy                           0.99       100
   macro avg       1.00      0.99      0.99       100
weighted avg       0.99      0.99      0.99       100

Confusion Matrix:
[[6 0 0 ... 0 0 0]
 [0 4 0 ... 0 0 0]
 [0 0 2 ... 0 0 0]
 ...
 [0 0 0 ... 3 0 0]
 [0 0 0 ... 0 1 0]
 [0 0 0 ... 0 0 3]]
'''
我们尝试将人脸分类为有眼镜和没有眼镜的人
'''

# 列表显示了这些图像的索引
glasses = [
   (10, 19), (30, 32), (37, 38), (50, 59), (63, 64),
   (69, 69), (120, 121), (124, 129), (130, 139), (160, 161),
   (164, 169), (180, 182), (185, 185), (189, 189), (190, 192),
   (194, 194), (196, 199), (260, 269), (270, 279), (300, 309),
   (330, 339), (358, 359), (360, 369)
]

# 定义一个函数,从这些片段返回一个新的目标数组
# 用1标记带有眼镜的人脸,而0用于没有眼镜的人脸
def create_target(segments):
    # create a new y array of target size initialized with zeros
    y = np.zeros(faces.target.shape[0])
    # put 1 in the specified segments
    for (start, end) in segments:
        y[start:end + 1] = 1
    return y


target_glasses = create_target(glasses)

# 再次进行训练/测试
X_train, X_test, y_train, y_test = train_test_split(faces.data, target_glasses, test_size=0.25, random_state=0)
svc_2 = SVC(kernel='linear')
# 交叉验证
evaluate_cross_validation(svc_2, X_train, y_train, 5)
[1.         0.95       0.98333333 0.98333333 0.93333333]
Mean score: 0.970 (+/-0.012)
train_and_evaluate(svc_2, X_train, X_test, y_train, y_test)
Accuracy on training set:
1.0
Accuracy on testing set:
0.99
Classification Report:
              precision    recall  f1-score   support

         0.0       1.00      0.99      0.99        67
         1.0       0.97      1.00      0.99        33

    accuracy                           0.99       100
   macro avg       0.99      0.99      0.99       100
weighted avg       0.99      0.99      0.99       100

Confusion Matrix:
[[66  1]
 [ 0 33]]
'''
我们分离同一个人的所有图像,有时戴眼镜,有时不戴眼镜。
我们还将同一个人的所有图像,索引从 30 到 39 的图像分离,通过使用剩余的实例进行训练,并评估我们新的 10 个实例集。
'''

X_test = faces.data[30:40]
y_test = target_glasses[30:40]
print(y_test.shape[0])

select = np.ones(target_glasses.shape[0])
select[30:40] = 0
X_train = faces.data[select == 1]
y_train = target_glasses[select == 1]
print(y_train.shape[0])

svc_3 = SVC(kernel='linear')
train_and_evaluate(svc_3, X_train, X_test, y_train, y_test)
10
390
Accuracy on training set:
1.0
Accuracy on testing set:
0.9
Classification Report:
              precision    recall  f1-score   support

         0.0       0.83      1.00      0.91         5
         1.0       1.00      0.80      0.89         5

    accuracy                           0.90        10
   macro avg       0.92      0.90      0.90        10
weighted avg       0.92      0.90      0.90        10

Confusion Matrix:
[[5 0]
 [1 4]]
# 10 张图片中,只有一个错误, 仍然是非常好的结果,看看哪一个被错误分类。
y_pred = svc_3.predict(X_test)

eval_faces = [np.reshape(a, (64, 64)) for a in X_test]

print_faces(eval_faces,y_pred,10)


insert image description here

'''
上图中的图像编号8带有眼镜,并且被分类为无眼镜。
如果我们看一下这个例子,我们可以看到它与其他带眼镜的图像不同(眼镜的边框看不清楚,人闭着眼睛),这可能就是它误判的原因。

我们创建了一个带有线性 SVM 模型的人脸分类器。
通常我们在第一次试验中不会得到如此好的结果。在这些情况下(除了查看不同的特征),我们可以开始调整算法的超参数。
在 SVM 的特定情况下,我们可以尝试不同的核函数;如果线性没有给出好的结果,我们可以尝试使用多项式或 RBF 核。此外,C和gamma参数可能会影响结果。
'''
[y_pred,y_test]
[array([1., 1., 1., 0., 0., 0., 0., 1., 0., 0.]),
 array([1., 1., 1., 0., 0., 0., 0., 1., 1., 0.])]

Guess you like

Origin blog.csdn.net/qq_44665283/article/details/130389921