2. Model evaluation

Model evaluation

1. Overfitting and underfitting

[External link image transfer failed. The source site may have an anti-leech link mechanism. It is recommended to save the image and upload it directly (img-QBxfCRGn-1615348716069) (C:\Users\LENOVO\Desktop\study notes\ML\assets\image-20210308102745943 .png)]

2. Evaluation method-performance evaluation

[External link image transfer failed. The source site may have an anti-leech link mechanism. It is recommended to save the image and upload it directly (img-N5e3PAwV-1615348716072) (C:\Users\LENOVO\Desktop\study notes\ML\assets\image-20210308103234054) .png)]

测试误差也叫经验误差,指的是在测试集上预测结果和实际结果上的误差。

测试误差越接近泛化误差越好。

3. Data set generation method

[External link image transfer failed. The source site may have an anti-leech link mechanism. It is recommended to save the image and upload it directly (img-K1JMHtj0-1615348716074) (C:\Users\LENOVO\Desktop\study notes\ML\assets\image-20210308103503411) .png)]

3.1 Distillation method

[External link image transfer failed. The source site may have an anti-leech link mechanism. It is recommended to save the image and upload it directly (img-2N16SNx5-1615348716076) (C:\Users\LENOVO\Desktop\study notes\ML\assets\image-20210308103606034 .png)]

Error rate: the number of errors on the test set / the total number of test sets * 100%

Disadvantages : The error caused by uneven sample distribution is large

3.2 Cross-validation method (commonly used)

[External link image transfer failed. The source site may have an anti-hotlinking mechanism. It is recommended to save the image and upload it directly (img-hl12fJ5L-1615348716077) (C:\Users\LENOVO\Desktop\study notes\ML\assets\image-20210308104319496) .png)]

Under normal circumstances, the data set is divided into k subsets: (10/20), and p trials are performed.

Advantages : reduced the error of uneven sample division, high accuracy of multiple tests

3.2.1 Special case: leave one method

Leave one data as the test set, and the rest as the training set

[External link image transfer failed. The source site may have an anti-leech link mechanism. It is recommended to save the image and upload it directly (img-MjL5V5hk-1615348716079)(C:\Users\LENOVO\Desktop\study notes\ML\assets\image-20210308104640815 .png)]

3.3 Self-service method

[External link image transfer failed. The source site may have an anti-leech link mechanism. It is recommended to save the image and upload it directly (img-gUjf2QRJ-1615348716080)(C:\Users\LENOVO\Desktop\study notes\ML\assets\image-20210308105007009 .png)]

4. Performance Metrics-Evaluation Metrics

[External link image transfer failed. The source site may have an anti-leech link mechanism. It is recommended to save the image and upload it directly (img-YpVscC1Y-1615348716081)(C:\Users\LENOVO\Desktop\study notes\ML\assets\image-20210308105752210 .png)]

4.1 Take the classification task as an example:

Generally use correct rate and error rate to evaluate

[External link image transfer failed. The source site may have an anti-leech link mechanism. It is recommended to save the image and upload it directly (img-YS9w9KGV-1615348716082) (C:\Users\LENOVO\Desktop\study notes\ML\assets\image-20210308105810815 .png)]

[External link image transfer failed. The source site may have an anti-leech link mechanism. It is recommended to save the image and upload it directly (img-DvP9RaQF-1615348716083) (C:\Users\LENOVO\Desktop\study notes\ML\assets\image-20210308105823693) .png)]

正确则当前为1,一共有m个测试样例,所以理想情况下共有m个1,则正确率即1 / m。

错误率 = 1 - 正确率。

不同的情形下:一般要综合其他因素,所以评估指标不唯一。

4.2 Take the regression task as an example:

Generally use mean square error evaluation

[External link image transfer failed. The source site may have an anti-leech link mechanism. It is recommended to save the image and upload it directly (img-ZeUvOfpd-1615348716084)(C:\Users\LENOVO\Desktop\study notes\ML\assets\image-20210308110416761 .png)]

5. Unbalanced data sets and special needs

[External link image transfer failed. The source site may have an anti-leech link mechanism. It is recommended to save the image and upload it directly (img-eQddHABe-1615348716085) (C:\Users\LENOVO\Desktop\study notes\ML\assets\image-20210308111710456) .png)]

Under special needs, the correct rate and the error rate cannot describe the quality of the model. In this case, the recall rate and precision rate appear

5.1 Accuracy and recall rate

[External link image transfer failed. The source site may have an anti-leech link mechanism. It is recommended to save the image and upload it directly (img-e5aIj5WX-1615348716086) (C:\Users\LENOVO\Desktop\study notes\ML\assets\image-20210308112200314 .png)]

When the precision rate drops, the recall rate will rise

Example: find good melons: precision rate (because you are testing how many good melons are really good melons)

[External link image transfer failed. The source site may have an anti-leech link mechanism. It is recommended to save the image and upload it directly (img-3iSLMezl-1615348716087) (C:\Users\LENOVO\Desktop\study notes\ML\assets\image-20210308112733757 .png)]

Which (recall/precision rate) is the standard?

5.1.1 Method 1: Use a balance point

[External link image transfer failed. The source site may have an anti-hotlinking mechanism. It is recommended to save the image and upload it directly (img-I9piOlPM-1615348716087) (C:\Users\LENOVO\Desktop\study notes\ML\assets\image-20210308113335611 .png)]

5.1.2 Method 2: F1 measurement

[External link image transfer failed. The source site may have an anti-leech link mechanism. It is recommended to save the image and upload it directly (img-94eg9OGY-1615348716088)(C:\Users\LENOVO\Desktop\study notes\ML\assets\image-20210308113434872 .png)]

5.1.3 Method 3: ROC curve

Used for: many methods to compare

[External link image transfer failed. The source site may have an anti-leech link mechanism. It is recommended to save the image and upload it directly (img-kSUwpmVY-1615348716088) (C:\Users\LENOVO\Desktop\study notes\ML\assets\image-20210308113608975 .png)]

[External link image transfer failed. The source site may have an anti-leech link mechanism. It is recommended to save the image and upload it directly (img-B0NI2TcL-1615348716089) (C:\Users\LENOVO\Desktop\study notes\ML\assets\image-20210308113625761 .png)]

The true rate is the same as the recall rate

[External link image transfer failed. The source site may have an anti-leech link mechanism. It is recommended to save the image and upload it directly (img-IZmM69tq-1615348716089) (C:\Users\LENOVO\Desktop\study notes\ML\assets\image-20210308113844858) .png)]

[External link image transfer failed. The source site may have an anti-leech link mechanism. It is recommended to save the image and upload it directly (img-1jGdA11A-1615348716090) (C:\Users\LENOVO\Desktop\study notes\ML\assets\image-20210308113816343) .png)]

6. Comparison test

[External link image transfer failed. The source site may have an anti-leech link mechanism. It is recommended to save the image and upload it directly (img-djOMps8o-1615348716091) (C:\Users\LENOVO\Desktop\study notes\ML\assets\image-20210308114216079) .png)]

7. Hypothesis testing

[External link image transfer failed. The source site may have an anti-leech link mechanism. It is recommended to save the image and upload it directly (img-BGP1llUI-1615348716091) (C:\Users\LENOVO\Desktop\study notes\ML\assets\image-20210308114517323) .png)]

On the test set, A has better performance than B. We want to evaluate which is better than B in a statistical sense.

(Verification of test error and generalization error)

7.1 Paired bilateral t-test

[External link image transfer failed. The source site may have an anti-leech link mechanism. It is recommended to save the image and upload it directly (img-TlBbwMSt-1615348716092)(C:\Users\LENOVO\Desktop\study notes\ML\assets\image-20210308114725542) .png)]

u is the mean, 6 is the variance (the mean and variance of the difference between the training errors of the two algorithms)

The table on the right is the critical value query table

7.2 Friedman test and Nemenyi follow-up test

[External link image transfer failed. The source site may have an anti-leech link mechanism. It is recommended to save the image and upload it directly (img-cr9GE5Gk-1615348716093) (C:\Users\LENOVO\Desktop\study notes\ML\assets\image-20210308115221107 .png)]

[External link image transfer failed. The source site may have an anti-leech link mechanism. It is recommended to save the image and upload it directly (img-jiOhnmf3-1615348716094)(C:\Users\LENOVO\Desktop\study notes\ML\assets\image-20210308115320480 .png)]

[External link image transfer failed. The source site may have an anti-leech link mechanism. It is recommended to save the image and upload it directly (img-MobcK6xl-1615348716095) (C:\Users\LENOVO\Desktop\study notes\ML\assets\image-20210308115509997) .png)]

[External link image transfer failed. The source site may have an anti-leech link mechanism. It is recommended to save the image and upload it directly (img-vd2Pzt9s-1615348716096) (C:\Users\LENOVO\Desktop\study notes\ML\assets\image-20210308115656343) .png)]

15348716093)]

[External link image is being transferred...(img-jiOhnmf3-1615348716094)]

[External link pictures are being transferred...(img-MobcK6xl-1615348716095)]

[External link image is being transferred...(img-vd2Pzt9s-1615348716096)]

[External link image transfer failed. The source site may have an anti-leech link mechanism. It is recommended to save the image and upload it directly (img-K8bPaIK4-1615348716097) (C:\Users\LENOVO\Desktop\study notes\ML\assets\image-20210308115902608 .png)]

Guess you like

Origin blog.csdn.net/qq_49821869/article/details/114632014