[Chew the book together] Vernacular interpretation of machine learning watermelon book——02 model evaluation and selection (2.3)

2.3 Performance measurement (0207 mean square error)

Insert picture description here
Performance measure: To evaluate the generalization performance of a model, not only an effective and feasible experimental estimation method is required, but also an evaluation standard to measure the generalization ability of the model

Performance metrics reflect task requirements. When comparing the capabilities of different models, using different performance metrics often leads to different evaluation results.
This means that the quality of the model is relative, and what kind of model is good depends not only on the algorithm. And data, it also depends on task requirements,
Insert picture description here
such as when m=3

D (x1,y1) (x2,y2) (x3, y3)
(2,3) (4,5) (7,1)
f(x) 4 5 1

then

  • Mean squared error = ((4-3) 2 + (5-5) 2 + (1-1) 2 ) / 3
  • Formula 2.3 is to multiply the probability

2.3.1 Error rate and accuracy (0208)

Insert picture description here
Error rate: the proportion of the number of samples with incorrect classification to the total number of samples

Accuracy: the proportion of the number of correctly classified samples to the total number of samples

For example, when m=3

D (x1,y1) (x2,y2) (x3, y3)
(2,3) (4,5) (7,1)
f(x) 4 5 1
y1!=f(x1) y2=f(x2) y3=f(x3)

then

  • error rate = 1/3
  • acc = 2/3 = 1- error rate

Insert picture description here
Insert picture description here

2.3.2 Duplicate check rate, recall rate and F1

Duplicate check rate, recall rate (0209)

Insert picture description here
Precision, also known as accuracy

  • How many good melons are the selected watermelons
  • What percentage of the retrieved information is of interest to users

Recall rate (recall), also known as recall rate

  • What percentage of all good melons has been picked out
  • How much of the information the user is interested in has been retrieved

Suppose m=100, y=1 is a positive example, and y=0 is a negative example

Actual value y 1 0 1 0 1
Predicted value y' 1 1 1 1 0

If there are 60 positive cases and 40 negative cases in the actual situation at this time; there are 70 positive cases and 30 negative cases in the predicted results

Real situation\ forecast result Positive examples (70) Counterexamples (30)
Positive examples (60) TP (50 real cases) FN (10 false counterexamples)
Counterexamples (40) FP (20 false positives) TN (20 true and counterexamples)

Recall rate P = TP / (TP+FP) = 50/70 Recall
rate R = TP / (TP+FN) = 50/60

PR Reverse Relationship Principle (0210)

Machine learning combat-Chapter 3 explains that there is a problem with only accuracy
Insert picture description here
. The above picture is an example of handwritten digit recognition. Suppose that 10 numbers 1-10 are given, and two classifications are made, that is, whether a number is 5 or not.

  • I just started training 1 model, and the prediction accuracy (=predicted right/all) is 96.615%
  • Give another model, as long as you see the number, it is judged that it is not 5, and the accuracy rate will be more than 90%.

This explains why accuracy is usually not the preferred performance indicator for classifiers, especially when dealing with skewed data sets, that is, when certain classes are more frequent than others,
Insert picture description here
Insert picture description here
why does PR reversely change? (Principle of the reverse change relationship of PR)
Insert picture description here
Thresholds change from 2->3 P to be larger, and R to be smaller; from 2->1 P to be smaller, and R to be larger

Generally speaking, when the precision is high, the precision is often low; when the precision is high, the precision is often low. E.g,

  • If you want to choose as many good melons as possible, you can increase the number of melons to choose. If you choose all the watermelons, then all the good melons will definitely be selected, but this accuracy rate Will lower
  • If you want to select as high a percentage of good melons as possible, you can choose only the most sure melons, but this will inevitably miss a lot of good melons, because the recall rate is low.

Usually only in some simple tasks, can the duplicate check rate and recall rate be high

PR reverse relationship image and F1 (0211)

Insert picture description here
When the threshold value becomes larger, P increases and R decreases

Insert picture description here
Under the same model as the threshold-PR diagram, it becomes a PR image
. Under the threshold-PR diagram, P=0.1 R=1. Plot the points on the diagram to get the reverse relationship diagram of PR.
But how to choose the best model performance?
Determination of the optimal threshold
Insert picture description here
Method one: R=P point
Method two: F1 measurement
Insert picture description here
Insert picture description here
Method three
Insert picture description here

The weight of circle 1 and circle 3 is 1, and the weight of circle 1 and circle 2 is beta 2.
When beta>1, R has more Large impact
When beta<1, P has a greater impact
Insert picture description here

macro/micro - P/R(0213)

Insert picture description here
Realize multiple classifications (take handwritten digit recognition as an example)

  1. Direct use of the algorithm
  2. Use two categories
    • O vs 1 (?): A group of 2 numbers, 1, 2; 1, 3; 1, 4; …; 2, 3; 2, 4; …. A total of 10*9/2=45 models are required
    • O vs Rest: 1 and other; 2 and other; ...; 10 and other. A total of 10 models are required

There are many two categories, many P and R above, but these two categories are essentially the same, and they solve a multi-category problem together. How to measure the quality of this model?

  • Method 1: Calculate first, then average
    Insert picture description here
  • Method 2: Average first and then calculate
    Insert picture description here
    Insert picture description here
    . One training set and multiple algorithms learned above.
    Next, multiple training sets and one algorithm

0214 Use the PR curve to compare different models.
Insert picture description here
In the case of the same recall rate, the precision of B is> C, so B is better than C,
but AB is not easy to compare. There are three methods:

  1. Compare the size of AB
  2. Than F1
  3. Than Fbeta

2.3.3 ROC and AUC

ROC curve and AUC (0215)

Insert picture description here
When the two curves intersect, it is more reasonable to compare the area under the ROC curve, that is, AUC (Area Under ROC Curve)
Insert picture description here
Insert picture description here
Insert picture description here
AUC is the area of ​​the shaded part in the above figure, and formula 2.20 uses micro Element method
(upper base + lower base) * height / 2
(y i + y i+1 )* (x i+1 -x i ) / 2

Rank loss rank-loss(0216)

Insert picture description here
+: Positive examples
-: Example Anti-
D + : the set of all positive examples
D - : negative examples of the set of all
m + : The number of positive examples
m - : Number of counterexample

Take the handwritten numbers in the following figure as an example,

  • + :=5
  • -:!=5
  • D + : the collection of all =5 pictures
  • D - :! = All 5 picture collection
  • m + : the number of positive examples, 6
  • m - : the number of inversions of the embodiment, six

Insert picture description here
The score on the horizontal axis is the score given to each picture. The more you go to the right, the higher the score.
Each positive example is numbered m +i , and each negative example is numbered m -i.
Then look at it in turn

  • There are several scores in m -i that are greater than m +1 . At this time, m -5 and m -6 meet the conditions, and there are 2
  • There are several scores in m -i that are greater than m +2 . At this time, m -6 meets the conditions, and there is 1
  • There are several scores in m -i that are greater than m +3 . At this time, m -6 meets the conditions, and there is 1
  • There are several scores in m -i that are greater than m +4 , and there are 0
  • There are several scores in m -i that are greater than m +5 , and there are 0
  • There are several scores in m -i that are greater than m +6 , and there are 0

rank-loss=(2+1+1+0+0+0)/(m+ * m-)=4/(6*6)=4/36

AUC与rank-loss(0217)

Insert picture description here
Insert picture description here

2.3.4 Cost-sensitive error rate and cost curve (to be continued)

Insert picture description here

The introduction of cost-sensitive curve (0218)

Different consequences of different types of errors caused by different types weigh loss caused by an error, the error may be assigned an unequal consideration (unequal cost)
Insert picture description here
cost matrix in figure dichotomous

  • cost ij represents the cost of predicting the i-th sample as the j-th sample
  • Generally speaking, cost ij =0
  • If the category 0 is judged to be the category 1, the loss is greater, then cost 01 > cost 10
  • The degree of loss differs by approximately, the greater the difference between the value of cost 01 and cost 10

The previous error rate is a direct calculation of the number of errors, without considering the different consequences of different errors
Insert picture description here
Insert picture description here
Insert picture description here

Cost curve thinking (0219)

Insert picture description here
You can refer to Zhihu-Machine Learning (Zhou Zhihua) Section 2.3.4, Understanding of the cost curve?

2021/2/19 I didn’t finish it, so I decided to skip it.

Guess you like

Origin blog.csdn.net/qq_42713936/article/details/113856246