Waffle Figure Case R language sensitivity and specificity, recall and precision as the selection criteria

Original link: http://tecdat.cn/?p=11159

 Precision and recall information retrieval from, but is also used in machine learning is provided. However, in some cases, may have problems using precision and recall. In this article, I will discuss the recall and precision of shortcomings, and explains why the sensitivity and specificity is usually more useful.

definition

Category 1 and 0 for a binary classification problem, the resulting confusion matrix having the structure:

Prediction / Reference 0
TP FP
0 FN TN

Where TP represents the number of true positives (model correctly predicted positive class), FP represents a positive number of false (model incorrectly predicted positive class), FN indicates a negative number (model incorrectly predicted negative category) false, TN represents the true number of negative (model correctly predicted negative category). Sensitivity (recall), accuracy (positive predictive value, the PPV) and specificity (true negative rate, TNV) is defined as follows:

 Determining the sensitivity of a correct prediction rate was observed from the positive results of classification, and the accuracy rate indicates the correct prediction is correct. On the other hand, the specificity is based on the number of false positives, which represents the rate of correctly predicted the results from the observation of negative category.

The sensitivity and specificity advantages

Based on a model to evaluate the sensitivity and specificity for most data sets, because these measures will consider all entries confusion matrix. The sensitivity of processing true and false positives and false negatives, specificity false positive and false negative treatment. This means that when taking into account the true positive and negative, the sensitivity and specificity of the binding is a holistic approach.

Sensitivity and specificity can be summarized in a single amount, i.e., the balance of accuracy, which is defined as the average of the two methods:

Balancing accuracy in the range [0,1] [0,1], where the values ​​0 and 1, respectively, represent the worst and the best classifier classifier.

Disadvantage of recall and precision

Using recall and precision evaluation model does not use a confusion matrix of all cells. The recall process is certainly true and false negative, and precision handling is certainly true and false positives. Therefore, the use of performance measures which would not be considered a real negative impact. Thus, the precision and recall rate should be used in the negative category correctly identify ineffective. Accuracy may be defined as

 

Precision and recall are usually grouped into a single number, i.e., F1 Score:

 

F1 in the range [0,1] [0,1] for the classifier, to maximize the precision and recall, will be 1. Since F1 based on the average score, and is therefore very sensitive to the accuracy of the recall rate different values. Suppose classifier sensitivity 90%, accuracy 30%. Then the routine will be the average , but the average (F1 score) will be .

example

Here, I offer two examples. The first example will study the accuracy problem may occur when used as a performance indicator. 

What would be a problem when using precision?

When there are few observations certainly belong to the category, precision is a particularly bad measure. Let us assume that a set of clinical data, 90% of which 90% of people sick (positive), only 10% of 10% of health (negative). Let's assume that we have developed two tests to sickness and in health classify patients. The accuracy of both tests are 80%, but will produce different types of errors.

# to use waffle, you need 
#   o FontAwesome
 
iron(
    waffle(c("Diseased" = 90, "Healthy" = 10), rows = 5, use_glyph = "child", 
        glyph_size = 5, title = "Reference", colors = ref.colors),
    waffle(c("Diseased (TP)" = 80, "Healthy (FN)" = 10, "Diseased (FP)" = 10), 
        rows = 5, use_glyph = "child", 
        glyph_size = 5, title = "Clinical Test 1", colors = c(true.colors[1], false.colors[2], false.colors[1])) 
)

 

 The first test of the confusion matrix

Prediction / Reference ill health
ill TP = 80 FP = 10
health FN = 10 TN = 0

Second test of confusion matrix

Prediction / Reference ill health
ill TP = 70 FP = 0
health FN = 20 TN = 10

Comparison of two tests

Let us compare the performance of two tests:

measuring Test 1 Test 2
Sensitivity (recall) 88.9% 77.7%
Specific 0% 100%
accurate 88.9% 100%

Considering the sensitivity and specificity, we will not select the first test, because it is only the balance of accuracy , balance and accuracy of the second test only .

However, the use of precision and recall, F1 of the first test score , and the second lower test score is . Thus, while specificity was 0%, but we found better than the first test second test. Therefore, when using this test,  all healthy patients will be classified as sick . This will be a big problem, because all of these patients are misdiagnosed due suffered severe psychological stress and expensive treatment. If we use specific, we will choose the second model, which will not generate any false positives competitively sensitive.

 

Let's consider an example of information retrieval, to illustrate useful when precision is the standard. Suppose we want to compare the two have a document retrieval algorithm accuracy of 80%.

 
iron(
    waffle(c("Relevant" = 30, "Irrelevant" = 70), rows = 5, use_glyph = "file", 
        glyph_size = 5, title = "Reference", colors = ref.colors),
    waffle(c("Relevant (TP)" = 25, "Irrelevant (FN)" = 5, "Relevant (FP)" = 15, "Irrelevant (TN)" = 55), 
        rows = 5, use_glyph = "file", 
        glyph_size = 5, title = "Retrieval Algorithm 1", colors = c(true.colors[1], false.colors[2], false.colors[1], true.colors[2])) 
)

 

 The first confusion matrix algorithm

Prediction / Reference Related  irrelevant 
Related  TP = 25 FP = 15
irrelevant  FN = 5 TN = 55

The second confusion matrix algorithm

Prediction / Reference Related  irrelevant 
Related  TP = 20 FP = 10
irrelevant  FN = 10 TN = 60

Comparison of two algorithms

Let's calculate the two algorithms based on performance confusion matrix:

measuring Algorithm 1 Algorithm 2
Sensitivity (recall) 83.3% 66.7%
Specific 78.6% 85.7%
accurate 62.5% 66.7%
Precision balance 80.95% 76.2%
F1 points 71.4% 66.7%

In this example, the balance of precision and F1 scores will lead to the first preferred algorithm rather than the second algorithm. Please note that the accuracy of the report is far greater than the balance F1 scores. This is because a large number of discarded observed from the negative kind of specificity of these two algorithms are high. As the F1 scores are not considered true negative rates, some degree of precision and recall than the sensitivity and specificity is more suitable for this task.

Summary

In this article, we see that performance indicators should be chosen carefully. Although the sensitivity and specificity generally performed well, but the precision and recall rate should be used in the case of true-negative rate does not work.

 

If you have any questions, please leave a comment below. 

 

 

Big Data tribe  - Chinese professional third-party data service providers to provide customized one-stop data mining and statistical analysis consultancy services

Statistical analysis and data mining consulting services: y0.cn/teradat (Consulting Services, please contact the official website customer service )

Click here to send me a messageQQ:3025393450

 

QQ exchange group: 186 388 004 

[Service] Scene  

Research; the company outsourcing; online and offline one training; data reptile collection; academic research; report writing; market research.

[Tribe] big data to provide customized one-stop data mining and statistical analysis consultancy

 

Welcome attention to micro-channel public number for more information about data dry!
 
 

Welcome to elective our R language data analysis will be mining will know the course!

 

If you have any questions, please leave a comment below. 

 

 

Big Data tribe  - Chinese professional third-party data service providers to provide customized one-stop data mining and statistical analysis consultancy services

Statistical analysis and data mining consulting services: y0.cn/teradat (Consulting Services, please contact the official website customer service )

Click here to send me a messageQQ:3025393450

 

QQ exchange group: 186 388 004 

[Service] Scene  

Research; the company outsourcing; online and offline one training; data reptile collection; academic research; report writing; market research.

[Tribe] big data to provide customized one-stop data mining and statistical analysis consultancy

 

Welcome attention to micro-channel public number for more information about data dry!
 
 

Welcome to elective our R language data analysis will be mining will know the course!

 

Guess you like

Origin www.cnblogs.com/tecdat/p/12341963.html