机器学习--------SVM

#SVM的使用

(结合具体代码说明,代码参考邹博老师的代码)

1、使用numpy中的loadtxt读入数据文件

data:鸢尾花数据

5.1,3.5,1.4,0.2,Iris-setosa

4.9,3.0,1.4,0.2,Iris-setosa

4.7,3.2,1.3,0.2,Iris-setosa

4.6,3.1,1.5,0.2,Iris-setosa

5.0,3.6,1.4,0.2,Iris-setosa

读取:

  1. def iris_type(s):  
  2.    it = {b'Iris-setosa': 0, b'Iris-versicolor': 1, b'Iris-virginica': 2}  
  3.    return it[s]  
  4. path = 'iris.data' # 数据文件路径  
  5. data = np.loadtxt(path, dtype=float, delimiter=',', converters={4: iris_type})  

:path路径

:dtype读取类型

:delimiter分隔符

:converters- A dictionary mapping column number to a function that will parse the

column string into the desired value. E.g., if column 0 is a date

string: ``converters = {0: datestr2num}``. Converters can also be

used to provide a default value for missing data (but see also

`genfromtxt`): ``converters = {3: lambda s: float(s.strip() or 0)}``.

Default: None.

#data

[[5.1, 3.5, 1.4, 0.2, 0. ],

[4.9, 3. , 1.4, 0.2, 0. ],

[4.7, 3.2, 1.3, 0.2, 0. ],

[4.6, 3.1, 1.5, 0.2, 0. ],

[5. , 3.6, 1.4, 0.2, 0. ]]

2、数据分训练测试集

#split用法

def split(ary,indices_or_sections,axis = 0):

'''

Split an array into multiple sub-arrays.

'''

Parameters-------------

ary : ndarray---Array to be divided into sub-arrays.

indices_or_sections---int or 1-D array

If `indices_or_sections` is an integer, N, the array will be divided

into N equal arrays along `axis`. If such a split is not possible,

an error is raised.

If `indices_or_sections` is a 1-D array of sorted integers, the entries

indicate where along `axis` the array is split. For example,

``[2, 3]`` would, for ``axis=0``, result in

- ary[:2]

- ary[2:3]

- ary[3:]

If an index exceeds the dimension of the array along `axis`,

an empty sub-array is returned correspondingly.

axis:int,optional---The axis along which to split,default is 0.

0按列分割,1按行分割

Return:sub-array:list of ndarrays

A list of sub-arrays

example:

x,y = numpy.split(data,(4,),axis = 1)

x:[[5.1, 3.5, 1.4, 0.2],

[4.9, 3. , 1.4, 0.2],

[4.7, 3.2, 1.3, 0.2],

[4.6, 3.1, 1.5, 0.2],

[5. , 3.6, 1.4, 0.2]]

y:[[0.],

[0.],

[0.],

[0.],

[0.]]

x = x[:, :2] #一种说法是为了后期可视化,经过本人测试,3的时候训练集准确率为100%,但正确率却远远低于2的时候。(具体原因没想明白,知道的小伙伴欢迎留言告知)

x_train, x_test, y_train, y_test = train_test_split(x, y, random_state=1, train_size=0.6) #random_state随机种子,train_size训练集占的百分比。

3、训练SVM

# clf = svm.SVC(C=0.1, kernel='linear', decision_function_shape='ovr')

【训练集正确率: 0.8 测试集正确率: 0.8】

clf = svm.SVC(C=0.8, kernel='rbf', gamma=20, decision_function_shape='ovr')

【训练集正确率:0.86666 测试集正确率:0.65】

clf.fit(x_train, y_train.ravel())

kernel='linear'时,为线性核,C越大分类效果越好,但有可能出现过拟合;

kernel='rbf'时,为高斯核,gamma越小,分类界面越连续;gamma越大,分类界面越分散,分类效果越好(训练集),但是有可能会过拟合。

decision_function_shape='ovr'时(one v rest),即一个类别与其他类别进行划分;

decision_function_shape='ovo'时(one v one),即将类别两两之间进行划分,用二分类的方法模拟多分类的结果。

# 准确率计算方式

print ( clf.score(x_train, y_train) ) # 精度

y_hat = clf.predict(x_train)

show_accuracy(y_hat, y_train, '训练集')

print ( clf.score(x_test, y_test) )

y_hat = clf.predict(x_test)

show_accuracy(y_hat, y_test, '测试集')

猜你喜欢

转载自www.cnblogs.com/monkeyT/p/10471459.html
今日推荐