scikit-learn svm库使用小结

自己在hsi_svm3d.py中实现过scikit-learn svm库用于高光谱图像分类任务

参数小结1
C：C-SVC的惩罚参数C?默认值是1.0
C越大，相当于惩罚松弛变量，希望松弛变量接近0，即对误分类的惩罚增大，趋向于对训练集全分对的情况，这样对训练集测试时准确率很高，但泛化能力弱。C值小，对误分类的惩罚减小，允许容错，将他们当成噪声点，泛化能力较强。
kernel ：核函数，默认是rbf，可以是‘linear’, ‘poly’, ‘rbf’, ‘sigmoid’, ‘precomputed’
　　0 – 线性：u’v
　　 1 – 多项式：(gamma*u’*v + coef0)^degree
　　2 – RBF函数：exp(-gamma|u-v|^2)
　　3 –sigmoid：tanh(gamma*u’*v + coef0)
degree ：多项式poly函数的维度，默认是3，选择其他核函数时会被忽略。
gamma ： ‘rbf’,‘poly’ 和‘sigmoid’的核函数参数。默认是’auto’，则会选择1/n_features
coef0 ：核函数的常数项。对于‘poly’和 ‘sigmoid’有用。
probability ：是否采用概率估计？.默认为False
shrinking ：是否采用shrinking heuristic方法，默认为true
tol ：停止训练的误差值大小，默认为1e-3
cache_size ：核函数cache缓存大小，默认为200
class_weight ：类别的权重，字典形式传递。设置第几类的参数C为weight*C(C-SVC中的C)
verbose ：允许冗余输出？
max_iter ：最大迭代次数。-1为无限制。
decision_function_shape ：‘ovo’, ‘ovr’ or None, default=None3
random_state ：数据洗牌时的种子值，int值
主要调节的参数有：C、kernel、degree、gamma、coef0。

参数小结2

这里写图片描述

参数选择：
1、使用sklearn 中的gridsearchcv方法，通过列出参数空间，让程序自己去遍历每一种超参数组合，找到最优的一组。
在svm中使用过，确实不错，但其对于无法满足的超参数（比如Nusvm中的nu）没有解决方案，会直接崩。
参见笔记网格追踪寻找最优超参数组合Parameter estimation using grid search with cross-validation
2、注意类别不平衡的情况下使用 class_weight=’balanced’

使用技巧：
1、数据预处理和一些函数的详细说明在 API 参考手册http://sklearn.apachecn.org/cn/0.19.0/modules/classes.html#module-sklearn.preprocessing
2、在样本维数m>=2*样本个数n时，分类效果就不好了，
3、sklearn.preprocessing.LabelBinarizer
方法：fit_transform(y)
功能：Fit label binarizer and transform multi-class labels to binary labels.
它有一个逆向方法：inverse_transform(Y, threshold=None)
Transform binary labels back to multi-class labels
其实很多函数都有逆向方法。

参考：
1、sklearn.svm.SVC 参数说明
2、scikit-learn 支持向量机算法库使用小结
博客园-刘建平Pinard
3、API 参考手册

转载请注明如下内容：
知乎： @Forfreedom
CSDN：Freedom_anytime的博客 - CSDN博客
简书：For_freedom - 简书

scikit-learn svm库使用小结

猜你喜欢