Selection and optimization of machine learning notes (six) machine learning algorithms

1, when the processing method when a large deviation of the predicted and actual results:

(1) increase the training sample;

(2) prevent overfitting reduced feature set;

(3) increasing the characteristic feature set or a polynomial (e.g. X . 1 ², X 2 ³, etc.);

(4) decrease / increase lambda.

 

2, the evaluation function is assumed:

The data set is divided into two parts: the training set (70%) and test set (30%)

specific process:

(1) [theta] is obtained through learning, training error calculating J (θ);

(2) calculating a test error J Test ([theta])

① For linear regression:

 ② For logistic regression:

 Misclassification error misclassification rate (0/1 misclassification rate):

Test error:

 

3, model selection:

Consider the following 10 candidate models:

(1) The data set is divided into three parts: the training set (60%), cross-validation set (cross validation set, the CV short, 20%) and test set (20%).

(2) adding a parameter d, indicates the number of polynomials in turn values ​​d = 1,2, ..., 10.

(3) the model training sequence, to obtain [theta] of each model, referred to as [theta] (. 1)  , [theta] (2)  , ..., [theta] (10)  

(4) The cross-validation set J is calculated the CV ([theta] (. 1) ), J the CV ([theta] (2) ), ..., J the CV ([theta] (10) ). Select the model with the least error.

(5) stars d = k (where k th model selected model), the error is calculated using a verification test set, the model evaluation.

 

4, bias and variance issues:

(1) Example:

Pic.1 underfitting (high bias) d = 1 Pic.2 fit d = 2 Pic.3 overfitting (high variance) d = 4

When d is small, high Deviation: J Train ([theta]) high, J the CV ([theta]) ≈ J Test ([theta]) ≈ J Train ([theta]).

When d is large, high Variance: J Train ([theta]) is low, J the CV ([theta]) ≈ J Test ([theta]) >> J Train ([theta]).

 

(2) assumed to have been elected d = 4, regularization is introduced to solve the problem of over-fitting:

For λ is too small, there remains overfitting; for λ is too large, it tends to assume a straight line function, underfitting.

① Set λ = 0,0.1,0.2,0.4, ..., 10.24 (last group can be set to 10, a total of 12 groups), were fitted to obtain a different [theta], respectively, corresponding to [theta] (. 1)  , [theta] (2)  , ..., [theta] (12 is) .

② be evaluated by cross-validation set, selects the best evaluation results λ.

 

5, the learning curve:

(1) a high deviation (less fit):

With the increase in the training set, fit the deviation is still large, the training and validation sets the error difference is small, and tends to level. No amount of explanation to collect data for the result of little help.

The following figure: With the increase in the training set, a fitting result is still straight, an error is still large.

 learning curve:

 

(2) a high variance (overfitting):

随着训练集的增多,训练集误差较低,但是验证集误差一直处于较高的状态,两者之间有很大的偏差。如果两者距离随着训练集的增大而靠近,收集更多的训练集可能会对结果带来帮助。

下图举例:随着训练集增多,曲线拟合越来越精细的拟合数据。

 学习曲线:

 

 6、算法的改进方法:

(1)收集更多的数据,对于高方差的情况;

(2)减少特征,对于高方差的情况;

(3)增加特征 / 多项式特征,对于高偏差的情况;

(4)增大 lambda,对于高方差的情况;

(5)减小 lambda,对于高偏差的情况。

 

7、神经网络的选择:

(1)小型神经网络:计算量小;隐藏层 / 单元少;参数少;容易出现欠拟合;

 (2)复杂神经网络:计算量大;隐藏层 / 单元多;参数多;容易产生过拟合。

可以使用正规化修正过拟合。

 

Guess you like

Origin www.cnblogs.com/orangecyh/p/11722889.html