Chapter 2 Model Evaluation and Selection
2.3 Performance measurement (0207 mean square error)
Performance measure: To evaluate the generalization performance of a model, not only an effective and feasible experimental estimation method is required, but also an evaluation standard to measure the generalization ability of the model
Performance metrics reflect task requirements. When comparing the capabilities of different models, using different performance metrics often leads to different evaluation results.
This means that the quality of the model is relative, and what kind of model is good depends not only on the algorithm. And data, it also depends on task requirements,
such as when m=3
D | (x1,y1) | (x2,y2) | (x3, y3) |
---|---|---|---|
(2,3) | (4,5) | (7,1) | |
f(x) | 4 | 5 | 1 |
then
- Mean squared error = ((4-3) 2 + (5-5) 2 + (1-1) 2 ) / 3
- Formula 2.3 is to multiply the probability
2.3.1 Error rate and accuracy (0208)
Error rate: the proportion of the number of samples with incorrect classification to the total number of samples
Accuracy: the proportion of the number of correctly classified samples to the total number of samples
For example, when m=3
D | (x1,y1) | (x2,y2) | (x3, y3) |
---|---|---|---|
(2,3) | (4,5) | (7,1) | |
f(x) | 4 | 5 | 1 |
y1!=f(x1) | y2=f(x2) | y3=f(x3) |
then
- error rate = 1/3
- acc = 2/3 = 1- error rate
2.3.2 Duplicate check rate, recall rate and F1
Duplicate check rate, recall rate (0209)
Precision, also known as accuracy
- How many good melons are the selected watermelons
- What percentage of the retrieved information is of interest to users
Recall rate (recall), also known as recall rate
- What percentage of all good melons has been picked out
- How much of the information the user is interested in has been retrieved
Suppose m=100, y=1 is a positive example, and y=0 is a negative example
Actual value y | 1 | 0 | 1 | 0 | … | 1 |
---|---|---|---|---|---|---|
Predicted value y' | 1 | 1 | 1 | 1 | … | 0 |
If there are 60 positive cases and 40 negative cases in the actual situation at this time; there are 70 positive cases and 30 negative cases in the predicted results
Real situation\ forecast result | Positive examples (70) | Counterexamples (30) |
---|---|---|
Positive examples (60) | TP (50 real cases) | FN (10 false counterexamples) |
Counterexamples (40) | FP (20 false positives) | TN (20 true and counterexamples) |
Recall rate P = TP / (TP+FP) = 50/70 Recall
rate R = TP / (TP+FN) = 50/60
PR Reverse Relationship Principle (0210)
Machine learning combat-Chapter 3 explains that there is a problem with only accuracy
. The above picture is an example of handwritten digit recognition. Suppose that 10 numbers 1-10 are given, and two classifications are made, that is, whether a number is 5 or not.
- I just started training 1 model, and the prediction accuracy (=predicted right/all) is 96.615%
- Give another model, as long as you see the number, it is judged that it is not 5, and the accuracy rate will be more than 90%.
This explains why accuracy is usually not the preferred performance indicator for classifiers, especially when dealing with skewed data sets, that is, when certain classes are more frequent than others,
why does PR reversely change? (Principle of the reverse change relationship of PR)
Thresholds change from 2->3 P to be larger, and R to be smaller; from 2->1 P to be smaller, and R to be larger
Generally speaking, when the precision is high, the precision is often low; when the precision is high, the precision is often low. E.g,
- If you want to choose as many good melons as possible, you can increase the number of melons to choose. If you choose all the watermelons, then all the good melons will definitely be selected, but this accuracy rate Will lower
- If you want to select as high a percentage of good melons as possible, you can choose only the most sure melons, but this will inevitably miss a lot of good melons, because the recall rate is low.
Usually only in some simple tasks, can the duplicate check rate and recall rate be high
PR reverse relationship image and F1 (0211)
When the threshold value becomes larger, P increases and R decreases
Under the same model as the threshold-PR diagram, it becomes a PR image
. Under the threshold-PR diagram, P=0.1 R=1. Plot the points on the diagram to get the reverse relationship diagram of PR.
But how to choose the best model performance?
Determination of the optimal threshold
Method one: R=P point
Method two: F1 measurement
Method three
The weight of circle 1 and circle 3 is 1, and the weight of circle 1 and circle 2 is beta 2.
When beta>1, R has more Large impact
When beta<1, P has a greater impact
macro/micro - P/R(0213)
Realize multiple classifications (take handwritten digit recognition as an example)
- Direct use of the algorithm
- Use two categories
- O vs 1 (?): A group of 2 numbers, 1, 2; 1, 3; 1, 4; …; 2, 3; 2, 4; …. A total of 10*9/2=45 models are required
- O vs Rest: 1 and other; 2 and other; ...; 10 and other. A total of 10 models are required
There are many two categories, many P and R above, but these two categories are essentially the same, and they solve a multi-category problem together. How to measure the quality of this model?
- Method 1: Calculate first, then average
- Method 2: Average first and then calculate
. One training set and multiple algorithms learned above.
Next, multiple training sets and one algorithm
0214 Use the PR curve to compare different models.
In the case of the same recall rate, the precision of B is> C, so B is better than C,
but AB is not easy to compare. There are three methods:
- Compare the size of AB
- Than F1
- Than Fbeta
2.3.3 ROC and AUC
ROC curve and AUC (0215)
When the two curves intersect, it is more reasonable to compare the area under the ROC curve, that is, AUC (Area Under ROC Curve)
AUC is the area of the shaded part in the above figure, and formula 2.20 uses micro Element method
(upper base + lower base) * height / 2
(y i + y i+1 )* (x i+1 -x i ) / 2
Rank loss rank-loss(0216)
+
: Positive examples
-
: Example Anti-
D + : the set of all positive examples
D - : negative examples of the set of all
m + : The number of positive examples
m - : Number of counterexample
Take the handwritten numbers in the following figure as an example,
+
:=5-
:!=5- D + : the collection of all =5 pictures
- D - :! = All 5 picture collection
- m + : the number of positive examples, 6
- m - : the number of inversions of the embodiment, six
The score on the horizontal axis is the score given to each picture. The more you go to the right, the higher the score.
Each positive example is numbered m +i , and each negative example is numbered m -i.
Then look at it in turn
- There are several scores in m -i that are greater than m +1 . At this time, m -5 and m -6 meet the conditions, and there are 2
- There are several scores in m -i that are greater than m +2 . At this time, m -6 meets the conditions, and there is 1
- There are several scores in m -i that are greater than m +3 . At this time, m -6 meets the conditions, and there is 1
- There are several scores in m -i that are greater than m +4 , and there are 0
- There are several scores in m -i that are greater than m +5 , and there are 0
- There are several scores in m -i that are greater than m +6 , and there are 0
rank-loss=(2+1+1+0+0+0)/(m+ * m-)=4/(6*6)=4/36
AUC与rank-loss(0217)
2.3.4 Cost-sensitive error rate and cost curve (to be continued)
The introduction of cost-sensitive curve (0218)
Different consequences of different types of errors caused by different types weigh loss caused by an error, the error may be assigned an unequal consideration (unequal cost)
cost matrix in figure dichotomous
- cost ij represents the cost of predicting the i-th sample as the j-th sample
- Generally speaking, cost ij =0
- If the category 0 is judged to be the category 1, the loss is greater, then cost 01 > cost 10
- The degree of loss differs by approximately, the greater the difference between the value of cost 01 and cost 10
The previous error rate is a direct calculation of the number of errors, without considering the different consequences of different errors
Cost curve thinking (0219)
You can refer to Zhihu-Machine Learning (Zhou Zhihua) Section 2.3.4, Understanding of the cost curve?
2021/2/19 I didn’t finish it, so I decided to skip it.