Various evaluation --Sklearn.metrics Introduction and application examples of machine learning Python sklearn

Python Sklearn.metrics Introduction and application examples

When using the Python variety of machine learning algorithm, often used sklearn (scikit-learn) This module / library.

Whether the use of machine learning algorithms for regression, classification or clustering, evaluation , namely quantitative indicators model test the effect of machine learning, are an inevitable and very important issue. Therefore, the combination scikit-learn home page introduction on, as well as some of the information online finishing the god of common evaluation and its implementation, the application of a brief introduction.

A, scikit-learn installation

Many online tutorials, not repeat them here, specific reference may be made:
https://www.cnblogs.com/zhangqunshi/p/6646987.html
In addition, if the Anoconda installation, directly from Anoconda Navigator - Search Add Environment inside.
pip install -U scikit-learn

Two, scikit-learn.metrics import and calling

There are two ways to import:

method one:

from sklearn.metrics import evaluation function name

E.g:

from sklearn.metrics import mean_squared_error
from sklearn.metrics import r2_score

Call for: direct function name calling
calculated mean square error of mean squared error

mse = mean_squared_error(y_test, y_pre)

Calculation of the regression coefficient of determination R2

R2 = r2_score(y_test,y_pre)
Second way:

from sklearn import metrics

Call way: metrics evaluation function name (parameter).

For example:
calculating a mean square error mean squared error

mse = metrics.mean_squared_error(y_test, y_pre)

Calculation of the regression coefficient of determination R2

R2 = metrics.r2_score(y_test,y_pre)

Three, scikit-learn.metrics in a variety of indicators Introduction

See brief:
https://www.cnblogs.com/mdevelopment/p/9456486.html
details see:
https://www.cnblogs.com/harvey888/p/6964741.html
official website:
HTTPS: // scikit -learn.org/stable/modules/classes.html#module-sklearn.metrics

Transfer the contents of the first link, a brief introduction as follows:

Return Index

  1. explained_variance_score (y_true, y_pred, sample_weight = None, multioutput = 'uniform_average'): the variance of regression (correlation between the reaction from the independent and dependent variables)

  2. mean_absolute_error (y_true, y_pred, sample_weight = None,
    multioutput = 'uniform_average'):
    mean absolute error

  3. mean_squared_error(y_true, y_pred, sample_weight=None, multioutput=‘uniform_average’):均方差

  4. median_absolute_error (y_true, y_pred) absolute error value

  5. r2_score(y_true, y_pred,sample_weight=None,multioutput=‘uniform_average’) :R平方值

Category Index

  1. accuracy_score(y_true,y_pre) : 精度

  2. The area under the ROC curve; AUC represent larger better performance: auc (x, y, reorder = False).

  3. average_precision_score (y_true, y_score, average = 'macro', sample_weight = None): average score calculation accuracy of prediction (AP) in accordance with

  4. brier_score_loss(y_true, y_prob, sample_weight=None, pos_label=None):The smaller the Brier score, the better.

  5. confusion_matrix (y_true, y_pred, labels = None, sample_weight = None): evaluated by calculating a confusion matrix returned to classification accuracy confusion matrix

  6. f1_score(y_true, y_pred, labels=None, pos_label=1, average=‘binary’, sample_weight=None): F1值
      F1 = 2 * (precision * recall) / (precision + recall) precision(查准率)=TP/(TP+FP) recall(查全率)=TP/(TP+FN)

  7. log_loss (y_true, y_pred, eps = 1e-15, normalize = True, sample_weight = None, labels = None): number of losses, also known as a logical cross entropy loss or loss

  8. precision_score (y_true, y_pred, labels = None, pos_label = 1, average = 'binary',): precision or accuracy; Precision (precision) = TP / (TP + FP)

  9. recall_score(y_true, y_pred, labels=None, pos_label=1, average=‘binary’, sample_weight=None):查全率 ;recall(查全率)=TP/(TP+FN)

  10. roc_auc_score (y_true, y_score, average = 'macro', sample_weight = None): the area under the ROC curve is calculated AUC value is, the larger the better

  11. roc_curve (y_true, y_score, pos_label = None, sample_weight = None, drop_intermediate = True); ROC curve calculated values of horizontal and vertical coordinates, TPR, the FPR
      TPR = TP / (TP + FN) Recall = (true positives rate, sensitivity) FPR = FP / (FP + TN ) ( Example false positive rate, 1-specificity)

Fourth, an application example

With case official website, using their own data, an application example implementation:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn import ensemble
from sklearn import  metrics

##############################################################################
# Load data
data = pd.read_csv('Data for train_0.003D.csv')
y = data.iloc[:,0]
X = data.iloc[:,1:]
offset = int(X.shape[0] * 0.9)
X_train, y_train = X[:offset], y[:offset]
X_test, y_test = X[offset:], y[offset:]

##############################################################################
# Fit regression model
params = {'n_estimators': 500, 'max_depth': 4, 'min_samples_split': 2,
          'learning_rate': 0.01, 'loss': 'ls'}
clf = ensemble.GradientBoostingRegressor(**params)

clf.fit(X_train, y_train)
y_pre = clf.predict(X_test)

# Calculate metrics
mse = metrics.mean_squared_error(y_test, y_pre)
print("MSE: %.4f" % mse)

mae = metrics.mean_absolute_error(y_test, y_pre)
print("MAE: %.4f" % mae)

R2 = metrics.r2_score(y_test,y_pre)
print("R2: %.4f" % R2)

##############################################################################
# Plot training deviance

# compute test set deviance
test_score = np.zeros((params['n_estimators'],), dtype=np.float64)

for i, y_pred in enumerate(clf.staged_predict(X_test)):
    test_score[i] = clf.loss_(y_test, y_pred)

plt.figure(figsize=(12, 6))
plt.subplot(1, 2, 1)
plt.title('Deviance')
plt.plot(np.arange(params['n_estimators']) + 1, clf.train_score_, 'b-',
         label='Training Set Deviance')
plt.plot(np.arange(params['n_estimators']) + 1, test_score, 'r-',
         label='Test Set Deviance')
plt.legend(loc='upper right')
plt.xlabel('Boosting Iterations')
plt.ylabel('Deviance')

##############################################################################
# Plot feature importance
feature_importance = clf.feature_importances_
# make importances relative to max importance
feature_importance = 100.0 * (feature_importance / feature_importance.max())
sorted_idx = np.argsort(feature_importance)
pos = np.arange(sorted_idx.shape[0]) + .5
plt.subplot(1, 2, 2)
plt.barh(pos, feature_importance[sorted_idx], align='center')
plt.yticks(pos, X.columns[sorted_idx])

plt.xlabel('Relative Importance')
plt.title('Variable Importance')
plt.show()

The final run results:
Here Insert Picture Description

Released nine original articles · won praise 20 · views 7131

Guess you like

Origin blog.csdn.net/Yqq19950707/article/details/90169913