sklearn -- -- --SVM(支持向量机)讲解及实现（分类）

class sklearn.svm.SVC(C=1.0, kernel='rbf', degree=3, gamma='auto', coef0=0.0, shrinking=True, probability=False, tol=0.001, cache_size=200, class_weight=None, verbose=False, max_iter=-1, decision_function_shape=None, random_state=None)

参数：
C：C-SVC的惩罚参数C 默认值是1.0
C越大，越惩罚松弛变量（误分类），希望松弛变量（误分类）接近0，趋向于对训练集全分对的情况，对训练集测试时准确率很高，但泛化能力弱。
【泛化能力(generalization ability)是指机器学习算法对新鲜样本的适应能力。学习的目的是学到隐含在数据对背后的规律，
对具有同一规律的学习集以外的数据，经过训练的网络也能给出合适的输出，该能力称为泛化能力。】
C值小，对误分类的惩罚减小，允许容错，将他们当成噪声点，泛化能力较强。
kernel ：核函数，默认是rbf，可以是‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed
　　0 – 线性：u'v
　　 1 – 多项式：(gamma*u'*v + coef0)^degree
　　2 – RBF函数：exp(-gamma|u-v|^2)
　　3 –sigmoid：tanh(gamma*u'*v + coef0)

扫描二维码关注公众号，回复： 2858769 查看本文章
degree ：多项式poly函数的维度，默认是3，选择其他核函数时会被忽略。
gamma ： ‘rbf’,‘poly’ 和‘sigmoid’的核函数参数。默认是’auto’，则会选择1/n_features
coef0 ：核函数的常数项。对于‘poly’和 ‘sigmoid’有用。
probability ：是否采用概率估计？.默认为False
shrinking ：是否采用shrinking heuristic方法，默认为true
tol ：停止训练的误差值大小，默认为1e-3
cache_size ：核函数cache缓存大小，默认为200
class_weight ：类别的权重，字典形式传递。设置第几类的参数C为weight*C(C-SVC中的C)
verbose ：允许冗余输出？
max_iter ：最大迭代次数。-1为无限制。
decision_function_shape ：‘ovo’, ‘ovr’ or None, default=None3
random_state ：数据洗牌时的种子值，int值
主要调节的参数有：C、kernel、degree、gamma、coef0。

import pandas as pd
import numpy as np

from sklearn import datasets

from sklearn import svm

# 使用交叉验证的方法，把数据集分为训练集合测试集
from sklearn.model_selection import train_test_split

# 加载iris数据集
def load_data():
    iris = datasets.load_iris()
    """展示数据集的形状
       diabetes.data.shape, diabetes.target.shape
    """

    # 将数据集拆分为训练集和测试集
    X_train, X_test, y_train, y_test = train_test_split(
    iris.data, iris.target, test_size=0.10, random_state=0)
    return X_train, X_test, y_train, y_test

# 使用LinearSVC考察线性分类SVM的预测能力
def test_LinearSVC(X_train,X_test,y_train,y_test):

    # 选择模型
    cls = svm.LinearSVC()

    # 把数据交给模型训练
    cls.fit(X_train,y_train)

    print('Coefficients:%s, intercept %s'%(cls.coef_,cls.intercept_))
    print('Score: %.2f' % cls.score(X_test, y_test))

if __name__=="__main__":
    X_train,X_test,y_train,y_test=load_data() # 生成用于分类的数据集
    test_LinearSVC(X_train,X_test,y_train,y_test) # 调用 test_LinearSVC


导入保存模型工具包
from sklearn.externals import joblib
#导入sklearn的svm算法
from sklearn import svm
#定义数据特征矩阵
x = [[1,2,3], [2,3,4],[1,2,3], [2,3,4],[1,2,3], [2,3,4],[1,2,3], [2,3,4],[1,2,3], [2,3,4]]
#定义数据目标值
y = [0, 1,0, 1,0, 1,0, 1,0, 1]
#定义svm
clf = svm.SVC()
#训练模型
clf.fit(x, y)
#定义测试矩阵
test_x = [[5,2,3], [3,3,4],[1,2,3], [2,3,4],[1,2,3], [2,3,4],[1,2,3], [2,5,4],[1,2,3], [2,3,4]]
#打印测试数据的预估值
print(clf.predict(test_x))
# 保存模型到/Users/yeshuai/train_model.pkl
joblib.dump(clf, "train_model.pkl")
#加载模型
model=joblib.load("train_model.pkl")
#使用模型
print(model.predict(test_x))

SVM也可以做回归，但是效果不是太理想，在sklearn中可以使用这个来实现

clf = svm.SVR()

sklearn -- -- --SVM(支持向量机)讲解及实现（分类）

猜你喜欢