Andrew Ng "machine learning" Course summary (10) _ Support Vector Machine

12.1-objective optimization

(1) The following cost function is a single sample and logistic regression

Instead of curve (2) using the first image above the purple line (referred cost1 or cost0), and then removing the number of samples m, and finally the C instead of 1 / λ (It can be understood, but not exclusively), to achieve the cost of logistic regression function to SVM conversion.

(3) the output will no longer be SVM Probability Logic regression, and is 0 or 1:

12.2 intuitive understanding of a large border

(1) First, the more stringent requirements for the z, requires only logistic regression greater than or less than zero,, here will be greater than or equal to 1 less than or equal to -1.

(2) Suppose C is very large, we will try to optimize the first term is zero, assuming you can get this parameter, you can convert the cost function is:

I.e. solving the foregoing minimum constraints in the latter.

(3) C is very large (i.e. very small λ), will try to satisfy the above constraints, this will cause very sensitive to outlier (overfitting), as follows:

At this time will be purple line, if appropriately reduced C, will be satisfied with black lines. That is, when C is not so large, you can ignore some of the outliers.

C is a penalty coefficient, is understood to optimize the direction of adjustment two indexes (the size of the interval, the classification accuracy) preference weights, i.e., error of latitude, C higher, indicating the tolerated error occurs easily overfitting, C The smaller, less easy fitting, C is too large or too small, poor generalization ability.

(3) Support vector machines are often referred to as the maximum distance classifier is true when C is large, but not so big C, will not, as it is shown in the example. But such understanding is helpful in understanding the SVM.

(4) C corresponds to larger λ is small, there will be over-fitting; underfitting otherwise occur.

 

The maximum boundary classification 12.3 mathematics behind the (elective)

Inner product (1) vector: a vector norm of the product projected to another projection vector of the vector length, which is then multiplied by the corresponding coordinates are added.

(2) the objective function is such that [theta] as small as possible, so long as the case is projected on the [theta] x as large as possible, it is possible to take a smaller value [theta] satisfying the constraint condition, which is the mathematical principles behind SVM.

(3) θ and 90 ° vertical boundary presentation, additional [theta] 0 is zero by the origin border, and vice versa without passing through the origin.

12.4 kernel 1

Number (1) if the direct polynomial fit to take it below the boundary, high polynomial willing to be required, many of the features.

(2)利用x的各个特征与我们预先选定地标(landmark)l(1),l(2),l(3),的近似程度选取新的特征f1,f2,f3。

上面是一个高斯核函数,注:这个函数与正态分布没什么实际上的关系,只是看上去像而已。

(3)与地标越近结果f越接近1,越远f越接近0。

(3)通过一下式子将很容易进行分类:

(4)核函数计算的结果即为新的特征。

12.5核函数2

(1)地标的个数设置为样本数m,即每个样本的位置即为地标的位置:

(2)将核函数运用到支持向量机中,

给定x,计算新特征f,当θTf>0时,预测y=1,否则反之。

相应的修改代价函数为:

在具体实施过程中,还需要对最后的正则化想微调,在计算时,用θTMθ代替θTθ。M跟选择的核函数有关,用相关库几块使用带核函数的SVM。

不带核函数的SVM称为线性核函数。

(3)以下是支持向量机的两个参数C和σ的影响:

C=1/λ;

C较大时,相当于λ较小,可能会导致过拟合,高方差;

C较小时,相当于λ较大,可能会导致欠拟合,高偏差;

σ较大时,可能会导致低方差,高偏差。

σ较小时,可能会导致低偏差,高方差。 

12.6使用支持向量机

(1)尽管不需要自己去写SVM函数,直接使用相关库,但需要做一下几件事:

1.是提出参数C的选择。在之前视频中已经讨论了C对方差偏差的影响。

2.选择内核参数或你想要使用的相似函数。

(2)以下是逻辑回归和支持向量机的选择:

1.相比于样本数m,特征数n大的多的时候,没有那么多数据量去训练一个非常复杂的模型,这时考虑用SVM。

2.如果n较小,而且m大小中等,例如n在1-1000之间,而m在10-1000之间,使用高斯函数的支持向量机。

3.如果n较小,而m较大,例如n在1-1000之间,而m大于50000,则使用支持向量机会非常慢,解决方案是创造增加更多的特征,然后使用逻辑回归或不带核函数的支持向量机。

神经网络在以上三种情况下都可以有较好的表现,但神经网络训练可能非常慢,选择支持向量机的原因主要在于它的代价函数是凸函数,不存在局部最小值。

Guess you like

Origin www.cnblogs.com/henuliulei/p/11281837.html