Introduction to Matthews Correlation Coefficient MCC

When evaluating the performance of machine learning models, F1score is the preferred indicator. In this article, we introduce an alternative metric that deserves more attention and recognition: the Matthews Correlation Coefficient (MCC).

F1score is calculated by reconciling precision and recall, aiming to strike a balance between the two. But suppose we have a dataset with the following confusion matrix:

In this case, the dataset represents a medical test for a rare disease with only a small number of positive cases. The confusion matrix shows that the model has a high true negative (TN) rate but a low true negative (TP) rate. Here are the calculations for precision, recall and F1 score:

  • Precision = TP / (TP + FP) = 25 / (25 + 10) ≈ 0.714
  • Recall = TP / (TP + FN) = 25 / (25 + 5) = 0.833
  • F1 Score = 2 * (Precision * Recall) / (Precision + Recall) ≈ 0.769

The F1 score is around 0.769, which seems like a reasonable performance. But a small number of missed positive examples can also have a significant impact in the real world.

So we introduce a new indicator: Matthews Correlation Coefficient (MCC)

Matthews Correlation Coefficient MCC

Matthews correlation coefficient Matthews coefficient is an indicator used to evaluate the performance of binary classification models, especially suitable for dealing with imbalanced data sets. It takes into account true positives (TP), true negatives (TN), false positives (FP) and false negatives (FN), providing a single value that summarizes the quality of the classification.

The value range of MCC is between -1 and +1, where:

  • +1 for perfect prediction
  • 0 means random prediction
  • -1 means that the prediction is completely inconsistent with the actual observation

The calculation formula of MCC is:

In this formula:

  • TP: True examples (correctly predicted positive examples)
  • TN: True Negatives (Correctly Predicted Negatives)
  • FP: false positives (incorrectly predicted positives)
  • FN: false negatives (mispredicted negatives)
  • sqrt: square root

MCC considers all four values ​​(TP, TN, FP, FN) and is therefore suitable for datasets with class imbalance, where one class may be more common than another. MCC is especially useful when you want to evaluate model performance independent of class distribution.

According to the above example, our MCC calculation result is:

 MCC = (25 * 9000 - 10 * 5) / sqrt((25 + 10) * (25 + 5) * (9000 + 10) * (9000 + 5))
 MCC ≈ 0.517

The MCC value is about 0.517.

In practice, higher MCC values ​​indicate better performance, and +1 is an ideal score. Typically, values ​​greater than 0.5 are considered good, and values ​​around 0 indicate random performance. Negative values ​​imply poor performance or a model that is worse than random guessing.

The difference with F1score

  1. Definition and Calculation Method: - MCC is a comprehensive performance index, which is calculated by a complex formula considering true cases, true negative cases, false positive cases and false negative cases. - The F1 score is the harmonic mean of Precision and Recall, which represents the performance of the model after balancing the accuracy and coverage of predictions.
  2. Weighing Imbalanced Datasets: - MCC can provide more accurate performance evaluation in imbalanced datasets because it simultaneously considers four classification results, including true negatives and true examples. - The F1 score also takes into account imbalanced datasets, but mainly focuses on the trade-off between precision and recall of the model.
  3. Advantages and Applicable Scenarios: - MCC is more advantageous for the case of class imbalance and small sample size, because it considers all four classification results when evaluating performance, reducing the randomness of the results. - The F1 score maintains a good performance even when the attention model can correctly identify positive examples, which is suitable for some scenarios that need to balance precision and recall.
  4. Interpretability: - The MCC ranges from -1 to +1 and is easier to interpret. +1 means perfect prediction, -1 means no agreement at all, and 0 means random. - The F1 score ranges from 0 to 1 and is also easy to interpret. 1 represents a perfect balance of precision and recall.

Index selection

Both Matthews Correlation Coefficient (MCC) and F1 Score (F1 Score) are indicators used to evaluate the performance of binary classification models, but they consider the prediction results of the model from different perspectives.

If the dataset has severe class imbalance and you want a more comprehensive performance evaluation metric, then MCC may be more appropriate. If you only care about the balance of precision and recall of the model, and don't care much about the proportion of true negative examples and true examples, then the F1 score may be more suitable.

https://avoid.overfit.cn/post/935db4fa639d4fbfbfe9ef425ce73fbc

Guess you like

Origin blog.csdn.net/m0_46510245/article/details/132354548