9 SVM -SVM
Compared to the neural network, do not worry about falling into local optimum problem, because it is a convex optimization
9.1 Support Vector Machine assume functions
\ [H _ {\ theta} (x) = \ left \ {\ begin {array} {ll} {1,} & {\ text {if} \ theta ^ {T} x \ geq 0} \\ {0, } & {\ text {other}
} \ end {array} \ right. \ tag {9.1} \] minimize the above formula, to obtain the parameters \ (\ Theta \) after the next hypothesis to be substituted as a function of the SVM,
9.2 Support Vector Machines decision boundary
- When C is very large, SVM will try to correct all sample points classification, SVM decision boundary will be significant changes occur due to special point. SVM will be separated by a maximum distance Sample
- Decision boundary parameter vector \ (\ Theta \) perpendicular
- Even \ (\ theta_0 \) is not equal to 0, that is the parameter vector or the origin of the decision boundary, however, the optimization objective function almost the same effect
- SVM actually change the parameters, the projection on the sample vectors to a vector of parameters from the origin of the maximum length, i.e. \ (p ^ {(i) }. || \ theta || \) of \ (p ^ {(i )} \) maximum, while the desired \ (|| \ || Theta \) minimum. Can be achieved \ (\ text {min} \ frac {1} {2} \ sum_ {j = 1} ^ {n} \ theta ^ 2_j \)
- Thus SVM classifier is also known as the large spacing
9.3 kernel
Nonlinear SVM method can solve the problem
9.3.1 Gaussian kernel - one similarity equation
\ (F \) is the characteristic variable X obtained in accordance with the sample and labeled calculated Gaussian kernel.
\(f\in[0,1]\)当训练好参数\(\theta\)后,根据一个样本x到不同标记的距离大小不同,特征变量不同,即赋予参数不同权重,最后计算出\(\theta^Tf\),判断y=0或y=1,从而得到非线性边界
9.3.2 选取标记l的方法
将所有训练样本均作为标记点
9.3.3 线性核函数
即没有核参数的函数
9.3.4 多项式核函数
效果较差,针对非线性问题
9.4 支持向量机参数与偏差方差关系
9.4.1 C的选择
C(\(=\frac{1}{\lambda}\))大:即\(\lambda\)小,\(\theta\)大,意味着低偏差,高方差,倾向于过拟合
C(\(=\frac{1}{\lambda}\))小:即\(\lambda\)大,\(\theta\)小,意味着高偏差,低方差,倾向于欠拟合
9.4.2 \(\sigma^2\)的选择
\(\sigma^2\)大,特征量\(f\)随着x变化缓慢,即x变化很多,\(\theta^Tf\)变化很小,边界变化慢偏差较大
\ (\ sigma ^ 2 \) is small, the feature quantity \ (F \) with the dramatic changes in x, i.e. x little change, \ (\ ^ Tf of the Theta \) large change much, rapid changes in the variance of the boundary
9.5 Use steps SVM
Select parameter C
Select the appropriate kernel function based on a sample where
Preparation of the selected kernel function, all input samples to generate characteristic variables
Zoom sample proportion based on a sample case
If the number of features n (10000)> the number of samples m (10-1000), using logistic regression or linear kernel, since less complex nonlinear function to fit the data can not be
<Number of samples m (10-50000) When a moderate number of features n (1-1000), using the Gaussian kernel,
If the number of features n (1-1000) <number of samples m (50000+) large, the SVM with Gaussian kernel will run very slowly, manually create more characteristic variables, logistic regression and linear kernel