"Machine Learning"------Experiment 2 (Support Vector Machine)

question:

  1. Based on SVM's iris data set and recognition
    (1), an SVM program is constructed to realize the classification of iris data set and pima-indians-diabetes data set.
    (2) It is required to be implemented using linear SVM and kernelized SVM methods, respectively. And compare and analyze the difference between the two methods. (divide the data set into training set and test set (or divide the data into 7:3 points, or use 5-fold crossover);

Dataset download address: http://archive.ics.uci.edu/ml/datasets/Iris

For the iris dataset:

#作   者:Asita
#开发时间:2021/11/19 20:45

from sklearn import svm
from sklearn.svm import LinearSVC
import numpy as np
from sklearn.model_selection import train_test_split

# define converts(字典)
def Iris_label(s):
    it = {
    
    b'Iris-setosa': 0, b'Iris-versicolor': 1, b'Iris-virginica': 2}
    return it[s]

# 1.读取数据集
path = 'F:/研究生/课程/机器学习/SVM/Iris.data'


data = np.loadtxt(path, dtype=float, delimiter=',', converters={
    
    4: Iris_label})
# converters={4:Iris_label}中“4”指的是第5列:将第5列的str转化为label(number)
# print(data)
# print(data.shape)

# 2.划分数据与标签
x, y = np.split(data, indices_or_sections=(4,), axis=1)  # x为数据,y为标签
# indices_or_sections: 如果是一个整数,就用该数平均切分,如果是一个数组,为沿轴切分的位置(左开右闭)
# axis=1表示纵向切分,默认为0(横向)
train_data, test_data, train_label, test_label = train_test_split(x, y, random_state=1, train_size=0.7,
                                                                  test_size=0.3)  # sklearn.model_selection.
# print(train_data.shape)

# 3.训练svm分类器
#C:误差项惩罚系数,默认值是1 kernel=linear表示线性核

classifier1 = svm.SVC(C=2, kernel='rbf', gamma=10, decision_function_shape='ovo')  # ovr:一对多策略
classifier2=LinearSVC(C=1e9)

classifier1.fit(train_data, train_label.ravel())  # ravel函数在降维时默认是行序优先
classifier2.fit(train_data,train_label.ravel())

# 4.计算svc分类器的准确率
print("准确率:")
print("rbf:")
print("训练集:", classifier1.score(train_data, train_label))
print("测试集:", classifier1.score(test_data, test_label))
print("线性:")
print("训练集:", classifier2.score(train_data, train_label))
print("测试集:", classifier2.score(test_data, test_label))

Running result:
insert image description here
This code directly adjusts the library to train the SVM, using the results of the linear SVM kernel function and the rcf kernel function on the Iris dataset respectively.

Running on the pima-indians-diabetes dataset is similar, except that there are only two types of labels (0, 1), and some minor changes have been made to the code.

Full code:

#作   者:Asita
#开发时间:2021/11/19 20:45

from sklearn import svm
import numpy as np
from sklearn.model_selection import train_test_split

# 1.读取数据集
path = 'F:/研究生/课程/机器学习/SVM/pima-indians-diabetes.data'

data = np.loadtxt(path)
# print(data)
# print(data.shape)

# 2.划分数据与标签
x, y = np.split(data, indices_or_sections=(8,), axis=1)  # x为数据,y为标签
# indices_or_sections: 如果是一个整数,就用该数平均切分,如果是一个数组,为沿轴切分的位置(左开右闭)
# axis=1表示纵向切分,默认为0(横向)
train_data, test_data, train_label, test_label = train_test_split(x, y, random_state=1, train_size=0.7,
                                                                  test_size=0.3)  # sklearn.model_selection.
# print(train_data.shape)

# 3.训练svm分类器
#C:误差项惩罚系数,默认值是1 kernel=linear表示线性核
classifier= svm.SVC(C=2, kernel='rbf', gamma=10, decision_function_shape='ovo')  # ovr:一对多策略
classifier.fit(train_data, train_label.ravel())  # ravel函数在降维时默认是行序优先
# 4.计算svc分类器的准确率
print("准确率:")
print("rbf:")
print("训练集:", classifier.score(train_data, train_label))
print("测试集:", classifier.score(test_data, test_label))

operation result:
insert image description here

Guess you like

Origin blog.csdn.net/Naruto_8/article/details/121458888