6. Assessment of classification model

6. Assessment of classification model

Training, tuning and modeling is an important part of the overall analysis of the life cycle, but more important to know how the performance of these models. Based on the general performance of the classification model to predict the output of the model for the new data. Typically, test data sets or retention performance measurement data set, the data set or data does not affect in any way to train a classifier. Usually the test data set comprising a number of observations and the corresponding label.

As with the use of the method of extracting characteristic training model. These features are sent to the trained model, we gained the forecast results for each data. Next, the prediction result and the actual label matches the observation model accurately predicted the outcome or situation.

There are several indicators can predict the performance of the model is determined, but will focus on the following several indicators:

  • Accuracy (accuracy).
  • Accuracy rate (precision).
  • Recall (recall).
  • F1 score。

Let's look at a practical example, take a look at how to calculate these indicators. Consider a message classified as "Spam" (spam), or "ham" (normal mail) binary classification of one. Suppose there are 20 messages for each message have the actual manual label, these messages has been built into the classifier, it has been predicted label of each message. Now, we want to measure the performance of the classifier by comparing the two predicted values ​​and the actual label. The following code is provided to initialize dependency, and the predicted actual label tag.

from  sklearn  import  metrics
import  numpy as np
import  pandas as pd
from  collections  import  Counter
 
actual_labels  =  [ 'spam' 'ham' 'spam' 'spam' 'spam' ,
                'ham' 'ham' 'spam' 'ham' 'spam' ,
                'spam' 'ham' 'ham' 'ham' 'spam' ,
                'ham' 'ham' 'spam' 'spam' 'ham' ]
               
predicted_labels  =  [ 'spam' 'spam' 'spam' 'ham' 'spam' ,
                     'spam' 'ham' 'ham' 'spam' 'spam' ,
                     'ham' 'ham' 'spam' 'ham' 'ham' ,
                     'ham' 'spam' 'ham' 'spam' 'spam' ]
 
 
ac  =  Counter(actual_labels)
pc  =  Counter(predicted_labels)

Now, take a look at the following code label and the actual total number of messages belonging to predict the label "span" or "ham" category:

In [ 4 ]:  print  ( 'Actual counts:' , ac.most_common())
    ...:  print  ( 'Predicted counts:' , pc.most_common())
Actual counts: [( 'ham' 10 ), ( 'spam' 10 )]
Predicted counts: [( 'spam' 11 ), ( 'ham' 9 )]

Therefore, seeing a total of 10 e-mail is "spam" e-mail and 10 is "ham". Classifier is forecasting a total of 11 e-mail "spam" and the 9-mail is "ham". Now compare how these e-mails are true "spam", what is it classified? Confusion matrix is ​​a good way to measure the performance of the two-class classification. The confusion matrix is ​​a table structure, it helps to visualize the performance of the classifier. Each column of the matrix representative of the predicted classification example, the classification matrix, each row represents an example of real classification label (if desired, may be reversed). Typically define a class label for the positive class, this is the class we are interested in. The following table gives a typical binary classification confusion matrix, where p represents a positive type, n-type represents trans.

 

p'

(prediction)

n‘

(prediction)

p

(actual)

True positive False positive

n

(actual)

False positive True positive

The above table it can be seen some of the terms in the matrix. True positive (TP) indicates the proper number of hits or predicted positive class. False negative (FN) indicates an error to say that the example of this kind of negative forecast for the wrong class. False positive (FP) it is incorrectly predicted positive class, in fact it is not. True negative (TN) is correctly predicted as a negative number of instances of the class.

The following code for our data resume a confusion matrix:

In [ 7 ]: cm  =  metrics.confusion_matrix(y_true = actual_labels,
    ...:                          y_pred = predicted_labels,
    ...:                          labels = [ 'spam' , 'ham' ])
    ...:
 
In [ 8 ]:  print (pd.DataFrame(
    ...:                     data = cm,
    ...:                     columns = pd.MultiIndex(levels = [[ 'Predicted:' ],
    ...:                                                  [ 'spam' , 'ham' ]],
    ...:                                          labels = [[ 0 , 0 ],[ 0 , 1 ]]),
    ...:                     index = pd.MultiIndex(levels = [[ 'Actual:' ],
    ...:                                                [ 'spam' , 'ham' ]],
    ...:                                        labels = [[ 0 , 0 ],[ 0 , 1 ]])))
    ...:
              Predicted:
                    spam ham
Actual: spam           5    5
         ham            6    4

Now get a picture similar to the figure in the confusion matrix. Assume spam "is being kind." Can now be defined in the preceding index the following code segment:

positive_class  =  'spam'
 
true_positive  =  5.
false_positive  =  6.
false_negative  =  5.
true_negative  =  4.

Since the necessary values ​​obtained from the confusion matrix can be calculated one by one to four performance indicators. To facilitate division calculation, these values ​​as floating point numbers. The use of indicators of scikit-learn (metriss) module, which is very strong, useful for calculating these indicators within a function. The manual definition and calculation of these indicators, so that you can clearly understand them, and to see the internal implementation of these functions from metriss module.

Accuracy is defined as the overall accuracy of the model or the proportion of correct predictions, by the following formula:

As used herein correctly predicted in the numerator and denominator divided by the total number of the number of output. The following code shows the accuracy of the calculation:

In [ 10 ]: accuracy  =  np. round (
     ...:                 metrics.accuracy_score(y_true = actual_labels,
     ...:                                        y_pred = predicted_labels), 2 )
     ...: accuracy_manual  =  np. round (
     ...:                     (true_positive  +  true_negative)  /
     ...:                       (true_positive  +  true_negative  +
     ...:                        false_negative  +  false_positive), 2 )
     ...:  print ( 'Accuracy:' , accuracy)
     ...:  print ( 'Manually computed accuracy:' , accuracy_manual)
     ...:
Accuracy:  0.45
Manually computed accuracy:  0.45

The ratio of the predicted number of all the relevant timing accuracy class is defined as positively predict the class, as described formula can be used:

All the predicted number of regular classes where the number of correct predictions by dividing the positive category, including the forecast for the number of false positives. The following code shows the calculation of the exact rate:

In [ 11 ]: precision  =  np. round (
     ...:                 metrics.precision_score(y_true = actual_labels,
     ...:                                         y_pred = predicted_labels,
     ...:                                         pos_label = positive_class), 2 )
     ...: precision_manual  =  np. round (
     ...:                         (true_positive)  /
     ...:                         (true_positive  +  false_positive), 2 )
     ...:  print ( 'Precision:' , precision)
     ...:  print ( 'Manually computed precision:' , precision_manual)
     ...:
Precision:  0.45
Manually computed precision:  0.45

Recall is defined correctly predicted number of instances of the n-type, also known as the hit rate, the rate of coverage or sensitivity, formula can be described as:

Among them, use a regular in the number of correctly predicted correctly and incorrectly predicted by dividing the sum of the number, get the hit rate. The following code shows the calculated recall:

In [ 12 ]: recall  =  np. round (
     ...:             metrics.recall_score(y_true = actual_labels,
     ...:                                  y_pred = predicted_labels,
     ...:                                  pos_label = positive_class), 2 )
     ...: recall_manual  =  np. round (
     ...:                     (true_positive)  /
     ...:                     (true_positive  +  false_negative), 2 )
     ...:  print ( 'Recall:' , recall)
     ...:  print ( 'Manually computed recall:' , recall_manual)
     ...:
Recall:  0.5
Manually computed recall:  0.5

F1 score is another standard index, obtained by computing the harmonic mean of precision and recall, which is represented by the following formula:

Can be calculated using the following code as the F1 score:

In [ 13 ]: f1_score  =  np. round (
     ...:                 metrics.f1_score(y_true = actual_labels,
     ...:                                  y_pred = predicted_labels,
     ...:                                  pos_label = positive_class), 2 )
     ...: f1_score_manual  =  np. round (
     ...:                     ( 2  *  precision  *  recall)  /
     ...:                     (precision  +  recall), 2 )
     ...:  print ( 'F1 score:' , f1_score)
     ...:  print ( 'Manually computed F1 score:' , f1_score_manual)
     ...:
F1 score:  0.48
Manually computed F1 score:  0.47

Here are a few of the most commonly used model to evaluate the classification of the main indicators, performance indicators will be used to more formal model of Apple.

Guess you like

Origin www.cnblogs.com/dalton/p/11353947.html