Chapter 3 classification

Chapter 3 classification

EDITORIAL

reference book

"Machine learning practical - based Scikit-Learn and TensorFlow"

tool

python3.5.1,Jupyter Notebook, Pycharm

problem solved

  1. Data given MNIST: [WinError 10060] Since the connection is unable to correctly respond after a period of time or the connected host does not respond, the connection attempt fails.

    Reference links: scikit-Learn to use fetch_mldata can not download MNIST dataset Problem Solution

StratifiedKFold

  • Compared to cross_val_score () this type of cross-validation function, which allows you to control some more, you can implement cross-validation on their own.img

cross_val_predict

  • Evaluation score is not returned, but a predicted value of each folded
  • The resulting one-dimensional array, you think, ah, multi-fold after each sample will be the one and only validation set, so get is consistent with the number of original samples predicted label

confusion_matrix

  • Confusion matrix
  • Confusion rows of the matrix represent the actual category columns represent predicted category.

decision_function()

  • This method returns a score for each instance, and then these fractions can be predicted using an arbitrary threshold value.

  • Use cross_val_predict () function to get all instances of the training set scores

    y_scores = cross_val_predict(sgd_clf, X_train, y_train_5, cv=3, method="decision_function")

  • With these scores, you may be used precision_recall_curve () precision and recall function to calculate all possible threshold

    precisions, recalls, thresholds = precision_recall_curve(y_train_5, y_scores)

    def plot_precision_recall_vs_threshold(precisions, recalls, thresholds):
        plt.plot(thresholds, precisions[:-1], "b--", label="Precision")
        plt.plot(thresholds, recalls[:-1], "g-", label="Recall")
        plt.xlabel("threshold")
        plt.legend(loc = "upper left")
        plt.ylim([0, 1])
    
    plot_precision_recall_vs_threshold(precisions, recalls, thresholds)
    plt.show()

roc_curve()

  • Receiver operating characteristic curve

    from sklearn.metrics import roc_curve
    fpr, tpr, thresholds = roc_curve(y_train_5, y_scores)
    def plot_roc_curve(fpr, tpr, label=None):
        plt.plot(fpr, tpr, linewidth=2, label=label)
        plt.plot([0, 1], [0, 1], 'k--')
        plt.axis([0, 1, 0, 1])
        plt.xlabel('False Positive Rate')
        plt.ylabel('True Positive Rate')
    plot_roc_curve(fpr, tpr)
    plt.show()

roc_auc_score()

  • There is a comparator classifier is to measure the area under the curve (AUC)
  • ROC AUC perfect classification is equal to 1, and purely random classifier ROC AUC equal to 0.5.
  • from sklearn.metrics import roc_auc_score

Select ROC curve and PR

  • Since the ROC curve and precision / recall (PR curve) are very similar, so you might ask how to decide which profile to use.
  • There is a rule of thumb is, when the positive type is very rare false positive or if you are more concerned about class rather than false negatives class, you should select the PR curve ; on the contrary it is the ROC curve. PR curve close as possible to the upper right corner.

Multi-class classifier

  • sklearn can detect you try to use a binary classification algorithm for multi-class classification task, it automatically runs OvR (except SVM classifier, it will use OvO).

  • If you want to force sklearn use one or many strategies you can use OneVsOneClassifier or OneVsRestClassifier class.

    from sklearn.multiclass import OneVsOneClassifier
    ovo_clf = OneVsOneClassifier(SGDClassifier(random_state=42))
    ovo_clf.fit(X_train, y_train)
    ovo_clf.predict([some_digit])
    len(ovo_clf.estimators_)

Error Analysis

  • cross_val_predict() + confusion_matrix()

My CSDN: https://blog.csdn.net/qq_21579045

My blog garden: https://www.cnblogs.com/lyjun/

My Github: https://github.com/TinyHandsome

Paper come Zhongjue know this practice is essential ~

Welcome to come OB ~

by Li Yingjun children

Guess you like

Origin www.cnblogs.com/lyjun/p/11350327.html