Andrew Ng machine learning introductory notes 9- SVM

9 SVM -SVM

Compared to the neural network, do not worry about falling into local optimum problem, because it is a convex optimization

9.1 Support Vector Machine assume functions

[Image dump outer link failure (img-kyfofqEG-1568603597814) (E: \ Artificial Intelligence Markdown \ Machine Learning \ pictures \ 9.1 SVM cost function .png)]
\ [H _ {\ theta} (x) = \ left \ {\ begin {array} {ll} {1,} & {\ text {if} \ theta ^ {T} x \ geq 0} \\ {0, } & {\ text {other}
} \ end {array} \ right. \ tag {9.1} \] minimize the above formula, to obtain the parameters \ (\ Theta \) after the next hypothesis to be substituted as a function of the SVM,

9.2 Support Vector Machines decision boundary

  • When C is very large, SVM will try to correct all sample points classification, SVM decision boundary will be significant changes occur due to special point. SVM will be separated by a maximum distance Sample

[Picture outside the chain dump failure (img-9UMiLXkz-1568603597816) (E: \ Artificial Intelligence Markdown \ Machine Learning \ pictures \ 9.2 SVM decision boundary -C great .png)]

[Picture outside the chain dump failure (img-4I6egZbN-1568603597820) (E: \ Artificial Intelligence Markdown \ Machine Learning \ pictures \ 9.2 SVM decision boundary - Large Margin Classifier explain .png)]

  • Decision boundary parameter vector \ (\ Theta \) perpendicular
  • Even \ (\ theta_0 \) is not equal to 0, that is the parameter vector or the origin of the decision boundary, however, the optimization objective function almost the same effect
  • SVM actually change the parameters, the projection on the sample vectors to a vector of parameters from the origin of the maximum length, i.e. \ (p ^ {(i) }. || \ theta || \) of \ (p ^ {(i )} \) maximum, while the desired \ (|| \ || Theta \) minimum. Can be achieved \ (\ text {min} \ frac {1} {2} \ sum_ {j = 1} ^ {n} \ theta ^ 2_j \)
  • Thus SVM classifier is also known as the large spacing

9.3 kernel

Nonlinear SVM method can solve the problem

9.3.1 Gaussian kernel - one similarity equation

[Image dump outer link failure (img-u4KDFX7h-1568603597821) (E: \ Artificial Intelligence Markdown \ Machine Learning \ pictures \ 9.3.1 numerals .png)]

[外链图片转存失败(img-I8xs88Wq-1568603597821)(E:\Artificial Intelligence Markdown\Machine Learning\pictures\9.3.1 高斯核函数.png)]

[外链图片转存失败(img-vmtLSIhs-1568603597822)(E:\Artificial Intelligence Markdown\Machine Learning\pictures\9.3.1 改变核函数参数.png)]

[外链图片转存失败(img-DU1ey0hN-1568603597822)(E:\Artificial Intelligence Markdown\Machine Learning\pictures\9.3.1 非线性边界获取.png)]

\ (F \) is the characteristic variable X obtained in accordance with the sample and labeled calculated Gaussian kernel.

\(f\in[0,1]\)当训练好参数\(\theta\)后,根据一个样本x到不同标记的距离大小不同,特征变量不同,即赋予参数不同权重,最后计算出\(\theta^Tf\),判断y=0或y=1,从而得到非线性边界

9.3.2 选取标记l的方法

将所有训练样本均作为标记点

[外链图片转存失败(img-23Oq2Snq-1568603597823)(E:\Artificial Intelligence Markdown\Machine Learning\pictures\9.3.2 选取标记方法.png)]

9.3.3 线性核函数

即没有核参数的函数

[外链图片转存失败(img-qEhafMwN-1568603597824)(E:\Artificial Intelligence Markdown\Machine Learning\pictures\9.3.3 线性核函数.png)]

9.3.4 多项式核函数

效果较差,针对非线性问题

[外链图片转存失败(img-bfUsIUUS-1568603597824)(E:\Artificial Intelligence Markdown\Machine Learning\pictures\9.3.4 多项式核函数.png)]

9.4 支持向量机参数与偏差方差关系

9.4.1 C的选择

C(\(=\frac{1}{\lambda}\))大:即\(\lambda\)小,\(\theta\)大,意味着低偏差,高方差,倾向于过拟合

C(\(=\frac{1}{\lambda}\))小:即\(\lambda\)大,\(\theta\)小,意味着高偏差,低方差,倾向于欠拟合

9.4.2 \(\sigma^2\)的选择

\(\sigma^2\)大,特征量\(f\)随着x变化缓慢,即x变化很多,\(\theta^Tf\)变化很小,边界变化慢偏差较大

\ (\ sigma ^ 2 \) is small, the feature quantity \ (F \) with the dramatic changes in x, i.e. x little change, \ (\ ^ Tf of the Theta \) large change much, rapid changes in the variance of the boundary

9.5 Use steps SVM

  1. Select parameter C

  2. Select the appropriate kernel function based on a sample where

  3. Preparation of the selected kernel function, all input samples to generate characteristic variables

  4. Zoom sample proportion based on a sample case

  5. If the number of features n (10000)> the number of samples m (10-1000), using logistic regression or linear kernel, since less complex nonlinear function to fit the data can not be

    <Number of samples m (10-50000) When a moderate number of features n (1-1000), using the Gaussian kernel,

    If the number of features n (1-1000) <number of samples m (50000+) large, the SVM with Gaussian kernel will run very slowly, manually create more characteristic variables, logistic regression and linear kernel

Guess you like

Origin www.cnblogs.com/jestland/p/11548527.html