Article directory
foreword
Machine learning model evaluation is the process of evaluating model performance, which is one of the core links of machine learning. In model evaluation, we use training set and test set to evaluate the performance of the model. The purpose of machine learning model evaluation is to evaluate the predictive ability of a model by using certain evaluation metrics and to determine which model is the best fit for a particular dataset. This article will introduce the basics of machine learning model evaluation, and provide some commonly used evaluation indicators and corresponding Python code examples
1. Training set and test set
In machine learning, we use the training set to train the model, and use the test set to verify the performance of the model, usually training set/test set = 8/2 or 7/3
2. Evaluation indicators
There are many evaluation indicators for machine learning models. We often use the following indicators to evaluate.
Before introducing the indicators, we will introduce a few parameters
- TP FP TN FN
- TP (ture positive): The TP positive class is judged as the positive class, that is, we let the model recognize the male, and it successfully recognizes the male
- FP (flase positive): The FP negative class is judged as a positive class, let the model recognize men, and he regards women as men's recognition
- TN (ture negatives): The TN negative class is judged as a negative class, let the model identify men, and successfully identify the remaining women as women
- FN (flase negatives): The FN positive class is judged as a negative class, let the model identify males, and filter out males as females
1 Accuracy
Accuracy is the most commonly used evaluation metric in classification problems. It refers to the proportion of the number of samples correctly classified by the model to the total number of samples. The higher the accuracy, the better the performance of the model.
- The mathematical formula is
Accuracy=(TP+TN)/(TP+FP+TN+FN)
In Python, we can use the scikit-learn library to calculate accuracy:
from sklearn.metrics import accuracy_score
accuracy = accuracy_score(y_true, y_pred)
y_true represents the true value label
y_pred represents the predicted value label
2 Precision
The accuracy rate refers to the proportion of true examples among the samples predicted by the model as positive examples. Precision is used to evaluate the accuracy of the model.
- Mathematical formula
Precision=TP/(TP+FP)
- python code
from sklearn.metrics import precision_score
precision=precision_score(y_true,y_pred)
3 Recall rate (Recall)
The recall rate refers to the proportion of samples that the model correctly predicts as positive samples to the total number of positive samples. Recall is used to assess the completeness of the model
- Mathematical formula
Recall=TP/(TP+FN)
- python code
from sklearn.metrics import recall_score
recall=recall_score(y_ture,y_pred)
4 F1 (F1 sorce)
F1 is the weighted average of precision and recall. It is a composite metric that simultaneously evaluates the accuracy and completeness of a model.
- Mathematical formula
F1=2*(Precision*Recall)/(Precision+Recall)
- python code
from sklearn.metrics import f1_score
f1 = f1_score(y_true, y_pred)
5. AUC值(Area Under the ROC Curve)
The AUC value is an indicator used to evaluate the performance of the binary classification model. It is the area under the ROC curve, which measures the ability of the model to predict positive and negative examples.
- python code
from sklearn.metrics import roc_auc_score
auc = roc_auc_score(y_true, y_pred_prob)
3. Overall use
from sklearn.datasets import make_classification
from sklearn.metrics import precision_score,recall_score,roc_auc_score,f1_score,accuracy_score
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
"生成一个二分类数据"
#n_samples代表生成数量,n_calss代表分类类型,random_state代表随机种子
X,y=make_classification(n_samples=2000,n_classes=2,random_state=42)
"""
生成训练集和测试机
"""
X_train,X_test,y_train,y_test=train_test_split(X,y,random_state=42,test_size=0.2)
"""
训练逻辑回归模型
"""
logistci=LogisticRegression(random_state=42)
logistci.fit(X_train,y_train)
"""
预测
"""
y_pre=logistci.predict(X_test)
# print(y_pre)
"""
模型评估
"""
accuracy_score=accuracy_score(y_test,y_pre)
print("accuracy_score:",accuracy_score)
precision_score=precision_score(y_test,y_pre)
print('precision_score:',precision_score)
recall_score=recall_score(y_test,y_pre)
print('recall_score:',recall_score)
f1_score=f1_score(y_test,y_pre)
print('f1_score:',f1_score)
roc_auc_score=roc_auc_score(y_test,y_pre)
print("roc_auc_score:",roc_auc_score)
I feel that these indicators are still relatively low.
Summarize
This article introduces the commonly used indicators and usage in machine learning evaluation. In the next section, we will use the handwritten digit dataset built into sklearn as an experiment for model training and visual evaluation.
I hope you will support me a lot, I will continue to study hard and share more interesting things