[Machine Learning] Model Evaluation - Handwritten Digit Set Model Training and Evaluation


foreword

Earlier we introduced the common methods and indicators of model evaluation. Now we will train the handwritten digit set, evaluate our model through different methods and further increase our understanding of model evaluation methods and indicators.

1. Dataset loading

import matplotlib
import pandas as pd
import seaborn as sns
import warnings
warnings.filterwarnings('ignore')
#%%
from sklearn.datasets import load_digits
data=load_digits()
print(data.data.shape)

insert image description here
There are 1797 (8x8) written digital pictures, let's show them

plt.gray()
plt.figure(figsize=(8,6))
plt.imshow(data.images[0])
plt.show()

insert image description here
This picture pixel has been processed, but we can still clearly see that it is 0

2. Divide training set and test set, shuffling operation, two classification

Here we list the ratio of the test set to the test set as 7:3

from sklearn.model_selection import train_test_split
X,y=data.data,data.target
X_train,y_train,X_test,y_test=train_test_split(X,y,test_size=0.3,random_state=42)

Since the data set has a certain regularity from 0 to 9, there is a correlation between the two, so we shuffle the training data

indexs=np.random.permutation(len(X_train))#permutaion的根据给出n长度,生成一个乱的从0到n的数组
X_train,y_train=X_train[indexs],y_train[indexs]

Since the data set has 0 to 9, 10 types, we will integrate the data into a binary classification set, divided into non-2 numbers and 2 numbers

y_train_2=(y_train==2)
y_test_2=(y_test==2)

3. Training Model and Prediction

We use the gradient descent classifier SGDClassifier of sklearn linear model

from sklearn.linear_model import SGDClassifier
sgd_clf=SGDClassifier(max_iter=10,random_state=42)
sgd_clf.fit(X_train,y_train_2)
y_prediction=sgd_clf.predict(X[2].reshape(1,-1))
print(y_prediction)

insert image description here
The prediction returns True, let's print the number of the position to see

print(y_prediction)
print(y[2])

insert image description here
It can be seen that the prediction was successful

4. Model evaluation

1. Cross Validation

Cross-validation is to divide the training set into multiple parts, and then take one part as the test set, and the other parts are used as the training set to verify each other until each part is evaluated as the verification set, so that the method can compare the data sets Less is more friendly. The cross_val_score function can help us better evaluate the performance of the model on the dataset and choose the optimal model. At the same time, by using cross-validation, we can make better use of the limited dataset and reduce the overfitting of the model.

from sklearn.model_selection import cross_val_score
cross_val_scores=cross_val_score(sgd_clf,X_train,y_train_2,cv=3,scoring='accuracy')#切割成3份
print('交叉验证得分 ',cross_val_scores)
print('交叉验证平均分 ',cross_val_scores.mean())
print('交叉验证方差 ',cross_val_scores.std())

insert image description here

It can be seen that the average score is still quite high, indicating that our model is better

In order to better analyze the performance of the model, we can use cross_val_predict, which is a function in Scikit-learn, which is used to perform cross-validation and return the predicted results of the model.
Specifically, the cross_val_predict function can receive a machine learning model, training data, target variables, and cross-validation parameters, and then return an array containing the prediction results of each cross-validation. These prediction results can be used to evaluate the performance of the model on the data set, and perform subsequent analysis and processing.
Unlike the cross_val_score function, the cross_val_predict function returns the prediction result of each sample, not the score of each cross-validation. This means that we can get more detailed information about model performance and perform deeper analysis and tuning.

from sklearn.model_selection import cross_val_predict
cross_val_predicts=cross_val_predict(sgd_clf,X_train,y_train_2,cv=3)
print('cross_val_predicts: ',cross_val_predicts)

insert image description here

2. Confusion Matrix-confusion matrix

The following indicators were mentioned in the previous article, so we will not explain too much

We said aboveTP,FP,TN,FNnow we go get it

from sklearn.metrics import confusion_matrix

confusion_matrixs=confusion_matrix(y_train_2,cross_val_predicts)
print(confusion_matrixs)

insert image description here
Here it is simply expressed as

[[TN FN]
[FP TP]]
Explanation: We successfully recognized 126 digits as 2, and mistakenly recognized 4 non-2 characters as 2

2.1 Precision,recall,f1_sorce

from sklearn.metrics import precision_score,recall_score,f1_score
precision_score=precision_score(y_train_2,cross_val_predicts)
recall_score=recall_score(y_train_2,cross_val_predicts)
f1_score=f1_score(y_train_2,cross_val_predicts)
print('精度:',precision_score)
print('召回',recall_score)
print('调和平均数:',f1_score)

insert image description here

2.2 The influence of ROC curve and threshold on the results

we set[5,2,4,3,2,2,2,2]
It is the result predicted by the model, and from left to right is the decision score [12, 22, 33, 42, 54, 63, 74, 80]. The higher the decision score, the higher the correct rate. At this time, we set the threshold score to 40. Then it is divided into two parts by the size of the score, and the one on the left is the one that our prediction failedFN, the right predicts the successfulTN, through the respective calculation formulas will affect our Precision, recall scores, so choosing an appropriate threshold can effectively improve our model evaluation.

Above we know that X[2] is 2, which is True. We check its decision score value. Scikit-Learn does not allow setting the threshold directly, but it can get the decision score and call its decision_function() method.

Scikit-Learn does not allow setting the threshold directly, but it can get the decision score,
calling its decision_function() method

y_sorce=sgd_clf.decision_function(X[2].reshape(1,-1))
print(y_sorce)

insert image description here

We can get all the decision scores at the same time

y_sorce=cross_val_predict(sgd_clf,X_train,y_train_2,cv=3,method='decision_function')
print(y_sorce[0:15])

insert image description here

Get all its thresholds
precision_recall_curve is a function in Scikit-learn, which is used to calculate the value and threshold of the precision and recall of the classification model.
Specifically, the precision_recall_curve function can receive the predicted probability and real label of a binary classification model, and then calculate the precision and recall rate under a series of thresholds, as well as the corresponding threshold value. These precision, recall, and threshold values ​​can be used to draw precision-recall curves, or to calculate performance metrics such as the average precision of the model.

from sklearn.metrics import precision_recall_curve
predictions,recalls,thresholds=precision_recall_curve(y_train_2,y_sorce)

insert image description here
We draw a line chart and observe the situation

sns.set_theme(style="darkgrid")
data_line=pd.DataFrame({
    
    "predictions":predictions[:len(thresholds)],"recalls":recalls[:len(thresholds)],'thresholds':thresholds})
sns.lineplot(x='thresholds',y='predictions',data=data_line)
sns.lineplot(x='thresholds',y='recalls',data=data_line)
# plt.savefig(f'D:\博客文档\模型评估\{random.randint(1,100)}.png')
plt.show()

insert image description here

Near 0, the precision and recall rate are the highest, and they are close to 0 and symmetrical, and we found that the change of the threshold has a great impact on the result, so we need to be cautious when setting the threshold.

  • AUC值(Area Under the ROC Curve)

To get the optimal threshold, we can also get the AUC value, and the roc_curve function used to draw the ROC curve
is a function in the scikit-learn library for drawing the ROC curve (Receiver Operating Characteristic Curve), which is used to evaluate binary classification model performance. The ROC curve shows the True Positive Ratetprand false positive rate (False Positive Rate)fprThe relationship between can help us choose the best threshold for the classification model.

  • We derive tpr, fpr, threshold, plot ROC observations
from sklearn.metrics import roc_curve
fpr,tpr,thresholds=roc_curve(y_train_2,y_sorce)
line_data=pd.DataFrame({
    
    'fpr':fpr,'tpr':tpr,'thresholds':thresholds})
sns.lineplot(data=line_data,x='fpr',y='tpr',)
sns.lineplot(data=pd.DataFrame({
    
    'x':[0,1],'y':[0,1]}))
plt.show()

insert image description here

The dashed line represents the ROC curve for a purely random classifier; a good classifier is as far away from this line as possible (towards the upper left), and it is clear that our model classifier is quite good.

We can also calculate our AUC value with roc_auc_score

from sklearn.metrics import roc_auc_score
roc_auc_score=roc_auc_score(y_train_2,y_sorce)
print("AUC值:",roc_auc_score)

insert image description here
Score yyds, I really hope that the AUC value of all my future models will be so high

  • Finding the optimal threshold usually needs to be determined by combining the actual application scenarios and the performance indicators of the classification model.

A commonly used method is to determine the optimal threshold based on the coordinate points on the ROC curve. On the ROC curve, we can find the optimal threshold by calculating the FPR and TPR coordinates corresponding to each threshold. Usually, we can choose the optimal threshold according to the following indicators:

  • Maximize TPR: When we pay more attention to the recall rate of the model (ie the true positive rate), we can choose the threshold that maximizes TPR as the optimal threshold.

  • Minimize FPR: When we pay more attention to the accuracy rate of the model (ie false positive rate), we can choose the threshold that minimizes FPR as the optimal threshold.

  • Maximize AUC: When we want to comprehensively consider the precision and recall of the model, we can choose the threshold that maximizes AUC (Area Under the Curve) as the optimal threshold.

  • Example If we pay more attention to the recall rate of the model, we can continue to use the above fpr, tpr, thresholds values.

from sklearn.metrics import roc_curve
fpr,tpr,thresholds=roc_curve(y_train_2,y_sorce)
# 找到最大化 TPR 的阈值
good_threshold = thresholds[np.argmax(tpr)]
print('最优阈值',good_threshold)

insert image description here
Add it to the model and calculate the recall rate

recall_scores=recall_score(y_train_2,cross_val_predicts)
print('没设置最优阈值recall_sorce:',recall_scores)
y_pred=(cross_val_predicts>good_threshold)
recall_score=recall_score(y_train_2,y_pred)
print('设置最优阈值后recall_sorce:',recall_scores)

The result is rather embarrassing. We can see from the above figure that our classification model is still quite good, so it cannot be optimized, maybe, maybe.
insert image description here
Then run it again and it still works.
insert image description here

  • Or we use the oldest method, every threshold is tested again
from sklearn.metrics import recall_score

for threshold in thresholds:
    b_recall_scoress = recall_score(y_train_2, cross_val_predicts)
    y_pred = (cross_val_predicts > threshold)

    e_recall_score = recall_score(y_train_2, y_pred)
    max_good_threshold.append(threshold)
    beging_recalls.append(b_recall_scoress)
    end_recalss.append(e_recall_score)

end_good_recalls_index=np.argmax(end_recalss)
b_recalss=beging_recalls[end_good_recalls_index]
e_threshold=max_good_threshold[end_good_recalls_index]
e_reclass=end_recalss[end_good_recalls_index]

print('最优阈值',e_threshold)
print('优化前',b_recalss)
print('优化后',e_reclass)

insert image description here

After threshold optimization, the recall is still significantly improved

V. Summary

In this section, we explain several methods of model evaluation and obtain the optimal threshold in detail, which will help us better evaluate and optimize the model.
I hope you will support me a lot, I will work harder to learn and share more interesting things

Guess you like

Origin blog.csdn.net/qq_61260911/article/details/129959869