Evaluation Metrics for Machine Learning Classification Models

introduction

        Machine learning is a branch of artificial intelligence designed to allow computer systems to improve their performance by learning from data. In machine learning, computer systems are trained to recognize patterns and regularities so that they can make predictions and decisions on new data. In the machine learning classification model, it is indispensable to make a reasonable evaluation of the model performance. Through the evaluation index gap between the training set and the test set, it can be judged whether the model has overfitting and the generalization ability of the model.

Common Machine Learning Classification Models

Machine learning can be divided into the following main categories:

  1. Supervised Learning: Supervised learning refers to training a computer system using data with known input-output pairs. In supervised learning, a computer system learns how to map inputs to outputs so that it can make predictions on new data. Typical supervised classification models include: support vector machine (SVM), logistic regression (LR), decision tree (DT), random forest (RF), XGBoost, gradient ascent and BP neural network, etc.

  2. Semi-supervised learning: Semi-supervised learning is a machine learning method between supervised learning and unsupervised learning. In semi-supervised learning, part of the data is labeled and another part of the data is unlabeled. By using labeled data, semi-supervised learning can train a model to recognize and learn patterns and regularities in the data. At the same time, using unlabeled data can help the model better understand the relationship between data and improve the performance and generalization ability of the model.

  3. Unsupervised Learning: Unsupervised learning refers to using unlabeled data to train computer systems. In unsupervised learning, a computer system must discover patterns and structures in data on its own so that new data can be classified or clustered. Typical unsupervised learning models include: K-Means clustering, KNN algorithm, Gaussian mixture model (GMM) and Birch clustering, etc.

Evaluation Metrics for Classification Models

       Common classification model evaluation indicators are to calculate the accuracy rate (AUC), precision rate (PRE), recall rate (REC), F1-Score and AUC indicators through the confusion matrix. The common 2×2 confusion matrix is ​​shown in Table 1.

Table 1 Example of 2×2 confusion matrix

actual value

predicted value=1

predicted value = 0

1 TP FN
0 FP TN

        In multi-classification problems, we usually first convert to one-vs-All (One-vs-All) two-classification problems for processing. For each category, we treat it as one category and all others as another category. Indicators such as precision, recall, F1-Score, and AUC can be calculated for each category.

  1. Accuracy : Accuracy is the proportion of all correctly predicted samples (including true cases and true negative cases) to the total samples.

  2. Precision (Precision or PRE) : The precision is the ratio of the predicted positive and correct samples to the predicted positive samples, also known as the precision rate.

  3. Recall rate (Recall or REC) : The recall rate is the ratio of the predicted positive and correct samples to the actual positive samples, also known as the recall rate.

  4. F1-Score : F1-Score is the harmonic mean of precision and recall, which is used to comprehensively evaluate the performance of the model.

  5. ROC and AUC : The area under the ROC curve (Receiver Operating Characteristic Curve) is AUC (Area Under Curve), which is used to measure the sorting performance of the model.

              In the ROC curve, the abscissa is FPR (False Positive Rate, false positive rate), and the ordinate is TPR (True Positive Rate, true rate). The calculation formulas of FPR and TPR are as follows:

        Among them, TP (True Positive) is the true example, that is, the number of samples that are predicted to be positive and actually positive. FP (False Positive) is a false positive, that is, the number of samples that are predicted to be positive but actually negative. FN (False Negative) is a false negative example, that is, the number of samples that are predicted to be negative but actually positive. TN (True Negative) is a true negative example, that is, the number of samples that are predicted to be negative and actually negative.

Guess you like

Origin blog.csdn.net/m0_61399808/article/details/131884328