SVC和SVR

SVC和SVR

We can see that there are sklearn.svm.SVC in sklearn of the SVM () and sklearn.svm.SVR () two methods, they actually SVM corresponding structure in the classification and regression of both of these problems:

  • support vector classify (SVC) support dichotomous classification machine to do, find out classification surface, to solve classification problems
  • support vector regression (SCR) do support regression curve fitting, regression function, make predictions, temperature, weather, stock
  • These will be used for data mining, text classification, speech recognition, bioinformatics, to analyze specific issues

For SVC, in fact, before we learned SVM, here say about SVR

Almost know the answer in a very good, I have here are summarized as follows:

Brief introduction

Intuitively, the SVM classifier (SVC Support Vector Classification) differs from the pattern SVR (Support Vector Regression) as follows:

img

For a sample [official], usually traditional direct output regression models [official]with real output [official]difference between the calculated loss, if and only if [official]and [official]when exactly the same, the loss is zero. Unlike this SVR assume that we can tolerate [official]and [official]up to between the [official]deviation, i.e., only when [official]the [official]difference between the absolute value greater than [official]when the calculation of the loss. This is equivalent to [official]constructing a width center [official]spacing zone, if the sample strip falls within this interval, is considered to be correctly predicted, as shown below:

img

Mathematical form

【reference】

img

So SVR problem can be formalized as follows:

[official]

其中 C 正则化常数, [official] 是下图的 ε-不敏感损失(ε-insensitive loss)函数:

[official]

img

引入松弛变量 [official][official] (间隔两侧的松弛程度有可能不同),可以将式(C2)重写为:

[official]

img

拉格朗日对偶形式

通过引入 [official] ,由拉格朗日乘子可以得到式(C3) 的拉格朗日函数:

[official]

[official] 带入上式,并令 [official] 的偏导为零,得到:

[official]

将式(C5)带入式(C4)可以得到 SVR 的对偶问题:

[official]

KKT 与最终决策函数

上述过程满足的 KKT 条件为:

[official]

可以看出,当且仅当 [official] 时, [official] 能取非零值,当且仅当, [official][official] 能取非零值。换言之,仅当样本 [official] 不落入 ε-间隔带中,相应的 [official][official] 才能取非零值。此外,约束 [official][official] 不能同时成立,因此 [official][official] 中至少有一个为零。

将式(C5)第一项带入决策函数,可得最终的决策函数为:

[official]

能使上式中 [official] 成立的样本即为 SVR 的支持向量,他们必然落在ε-间隔带之外。显然 SVR 的支持向量仅是训练样本的一部分,即其解仍然具有稀疏性。

由 KKT 条件可以看出,对于每个样本 [official] 都有 [official][official] ,于是在得到 [official] 之后,若 [official] 则必有 [official] ,继而有:

[official]

因此,若求解式(C6)得到 \alpha_i 后,理论上说可以任意选取满足 [official] 的样本,通过式(C9)求得 b。在实践中采用一种更鲁棒的办法:选择多个(或所有)满足条件 [official] 的样本求解 b 后去平均值。

核函数的形式最终的决策函数为:

[official]

其中 [official] 为核函数。

不同核的回归效果

【参考】

img

下面这一段实践建议我个人觉得也是很中用的:

基于 Sklearn 的实践建议

【参考】

  • sklearn - Tips on Practical Use

  • 避免数据拷贝
  • 核缓存的大小:对于 SCVSVRNuSVCNuSVR,核函数缓存的大小对于大型问题的运行时间有着非常大的影响。如果有足够多的内存,建议把cache_size的大小设置的尽可能的大。
  • 设置 C:1 是一个合理的默认选择,如果有较多噪点数据,你应该较少 C 的大小。
  • SVM 算法不是尺度不变,因此强烈建议缩放你的数据。如将输入向量 X 的每个属性缩放到[0,1] 或者 [-1,1],或者标准化为均值为 0 方差为 1 。另外,在测试向量时也应该使用相同的缩放,已获得有意义的结果。
  • 对于SVC,如果分类的数据不平衡(如有很多的正例很少的负例),可以设置class_weight='balanced',或者尝试不同的惩罚参数 C
  • The randomness of the underlying implementation: SVCand NuSVCunderlying implementation uses a random number generator, the shuffle data (when the estimated probability probabilityis set True), by random random_stateparameter control. If probabilityset False, these estimates are not random, random_statethe results are not affected.
  • Use L1punishment to generate a sparse solution


Reference

  1. https://www.cnblogs.com/ylHe/p/7676173.html
  2. https://zhuanlan.zhihu.com/p/50166358

Guess you like

Origin www.cnblogs.com/jiading/p/12105232.html