[更新ing]sklearn(十四)：Support Vector Machines *

SVM可以用于classification，regression，outlier detection。

SVM优缺点

SVM的优点：

SVM在高维数据上也非常有效。
当n_features > n_samples，SVM依然有效。
SVM的决策函数只由支持向量机决定，因此，SVM无需存储所有的training data，从这一点来讲，SVM的空间复杂度较低。
SVM能够利用kernel trick，将决策函数从linear model变为non-linear model，拟合更复杂的数据。

SVM缺点：

当n_features > n_samples，为防止SVM overfitting，需要对loss function 添加正则项。
SVM只能预测样本label，而不能给出概率估计。如果要进行概率估计，需要利用cross-validation，在额外的test
set上进行。在binary classification中，用Platting scaling进行概率校验。需要注意的是，在large
dataset中，Platting scaling in
cross-validation计算量非常大，且利用predict_score和predict_proba得出的label可能不一致，此外，Platting
scaling本身也存在一些理论上的问题，所以，在使用SVM时不建议启用概率校验（即最好将probability parameter
set to False，使用decision function而不是predict_proba去预测样本label）。

Recap：classifier的概率校验有两种方法：Platting scaling，non-parameter isotonic regression。Platting scaling应用于calibration curve为logistic regression，且用于校验的dataset较少。non-param isotonic regression用于calibration curve不为logistic regression，校验dataset较多的情况。
note that：SVM calibration curve shape is sigmoid。

Unbalanced problem

对于dataset中各个class样本量不均衡的情况，可以通过SVM function中的class_weight parameter调整各个label样本的权重，其权重将是parameter C multiply class_weight。
除可调整class weight以外，还可以通过parameter sample_weight调整sample weight，所得sample权重为parameter C multiply sample_weight。

Classification

sklearn.svm.SVC(C=1.0, kernel=’rbf’, degree=3, gamma=’auto_deprecated’, coef0=0.0, shrinking=True, probability=False, tol=0.001, cache_size=200, class_weight=None, verbose=False, max_iter=-1, decision_function_shape=’ovr’, random_state=None)
#degress：polynomial 的参数
#gamma：一些kernel的参数
#coef0：polynomial和sigmoid kernel中的常数项
#shrinking：？？？
#probability：是否求待测样本predict_proba
#cache_size：kernel cache
#decision_fucntion_shape：{ovr：one vs rest，ovo：one vs one}
#C：误差函数的系数

sklearn.svm.NuSVC(nu=0.5, kernel=’rbf’, degree=3, gamma=’auto_deprecated’, coef0=0.0, shrinking=True, probability=False, tol=0.001, cache_size=200, class_weight=None, verbose=False, max_iter=-1, decision_function_shape=’ovr’, random_state=None)
#nu：用于控制“支持向量机”的数量

sklearn.svm.LinearSVC(penalty=’l2’, loss=’squared_hinge’, dual=True, tol=0.0001, C=1.0, multi_class=’ovr’, fit_intercept=True, intercept_scaling=1, class_weight=None, verbose=0, random_state=None, max_iter=1000)
#dual：决定算法是解决primal problem，还是解决dual problem。当n_samples > n_features，better set dual=False。

Regression

sklearn.svm.SVR(kernel=’rbf’, degree=3, gamma=’auto_deprecated’, coef0=0.0, tol=0.001, C=1.0, epsilon=0.1, shrinking=True, cache_size=200, verbose=False, max_iter=-1)
#epsilon：Epsilon in the epsilon-SVR model. It specifies the epsilon-tube within which no penalty is associated in the training loss function with points predicted within a distance epsilon from the actual value.

sklearn.svm.NuSVR(nu=0.5, C=1.0, kernel=’rbf’, degree=3, gamma=’auto_deprecated’, coef0=0.0, shrinking=True, tol=0.001, cache_size=200, verbose=False, max_iter=-1)
#nu：控制“支持向量机”的个数？？？
sklearn.svm.LinearSVR(epsilon=0.0, tol=0.0001, C=1.0, loss=’epsilon_insensitive’, fit_intercept=True, intercept_scaling=1.0, dual=True, verbose=0, random_state=None, max_iter=1000)

[待更新]：shrinking parameter
[待更新]：nu

Density estimation, novelty detection

OneClassSVM可以用于outlier detection。但是它对outlier很敏感，因此，outlier detection效果并不好。

sklearn.svm.OneClassSVM(kernel=’rbf’, degree=3, gamma=’auto_deprecated’, coef0=0.0, tol=0.001, nu=0.5, shrinking=True, cache_size=200, verbose=False, max_iter=-1, random_state=None)  #unsupervised outlier detection

sklearn(十五)：Novelty and Outlier Detection
官方文档：Novelty and Outlier Detection

Tips on Pratical Use

avoiding data copy
对于SVC，NuSVC，SVR，NuSVR，kernel
cache的大小对运行时间有很大影响，因此，如果RAM充足的话，尽量将Kernel cache设置为>200（default
value）的值。
参数C的值越大，正则化效果越弱。如果trianing data本身有很多噪音，应该将C调小一点，以加强正则化效果。
SVM对于data的scale很敏感，因此，在拟合模型之前，应该先将data的值缩放到[0,1]或[-1,1]。
如果training data各个类别样本数量严重失衡，应该将参数class_weight=‘balanced’，并试用不同的惩罚参数C。

kernel functions

各个SVM function中的kernel参数，可以接受3中形式的kernel：
1、系统自带的kernel，有4中{linear,polynomial,RBF,sigmoid}
2、用户自定义kernel function，赋值给kernel parameter，该kernel function必须返回一个kernel matrix。

>>> import numpy as np
>>> from sklearn import svm
>>> def my_kernel(X, Y):
...     return np.dot(X, Y.T)
...
>>> clf = svm.SVC(kernel=my_kernel)

3、将参数kernel=‘precomputed’，并将Gram matrix（执行kernel trick后的feature matrix）传入SVM function的fit method。

>>> import numpy as np
>>> from sklearn import svm
>>> X = np.array([[0, 0], [1, 1]])
>>> y = [0, 1]
>>> clf = svm.SVC(kernel='precomputed')
>>> # linear kernel computation
>>> gram = np.dot(X, X.T)
>>> clf.fit(gram, y) 
SVC(C=1.0, cache_size=200, class_weight=None, coef0=0.0,
    decision_function_shape='ovr', degree=3, gamma='auto_deprecated',
    kernel='precomputed', max_iter=-1, probability=False,
    random_state=None, shrinking=True, tol=0.001, verbose=False)
>>> # predict on training examples
>>> clf.predict(gram)
array([0, 1])