学习笔记之Machine Learning Crash Course | Google Developers

Machine Learning Crash Course  |  Google Developers

  • https://developers.google.com/machine-learning/crash-course/
  • Google's fast-paced, practical introduction to machine learning

Classification

  • Thresholding
    • Logistic regression returns a probability. You can use the returned probability "as is" (for example, the probability that the user will click on this ad is 0.00023) or convert the returned probability to a binary value (for example, this email is spam).
    • In order to map a logistic regression value to a binary category, you must define a classification threshold (also called the decision threshold). A value above that threshold indicates "spam"; a value below indicates "not spam." It is tempting to assume that the classification threshold should always be 0.5, but thresholds are problem-dependent, and are therefore values that you must tune.
    • Note: "Tuning" a threshold for logistic regression is different from tuning hyperparameters such as learning rate. Part of choosing a threshold is assessing how much you'll suffer for making a mistake. For example, mistakenly labeling a non-spam message as spam is very bad. However, mistakenly labeling a spam message as non-spam is unpleasant, but hardly the end of your job.
  • True vs. False and Positive vs. Negative
    • A true positive is an outcome where the model correctly predicts the positive class. Similarly, a true negative is an outcome where the model correctly predicts the negative class.
    • A false positive is an outcome where the model incorrectly predicts the positive class. And a false negative is an outcome where the model incorrectly predicts the negative class.
  • Accuracy
    • Accuracy is one metric for evaluating classification models. Informally, accuracy is the fraction of predictions our model got right. Formally, accuracy has the following definition:
      • Accuracy = Number of correct predictions / Total number of predictions
    • For binary classification, accuracy can also be calculated in terms of positives and negatives as follows:
      • Accuracy = (TP+TN) / (TP+TN+FP+FN)
      • Where TP = True Positives, TN = True Negatives, FP = False Positives, and FN = False Negatives.
    • Accuracy alone doesn't tell the full story when you're working with a class-imbalanced data set, like this one, where there is a significant disparity between the number of positive and negative labels.
  • Precision and Recall
    • Precision attempts to answer the following question:
      • What proportion of positive identifications was actually correct?
    • Precision is defined as follows:
      • Precision=TP / (TP+FP)
    • Note: A model that produces no false positives has a precision of 1.0.
    • Recall attempts to answer the following question:
      • What proportion of actual positives was identified correctly?
    • Mathematically, recall is defined as follows:
      • Recall=TP / (TP+FN)
    • Note: A model that produces no false negatives has a recall of 1.0.
    • To fully evaluate the effectiveness of a model, you must examine both precision and recall. Unfortunately, precision and recall are often in tension. That is, improving precision typically reduces recall and vice versa.
    • Various metrics have been developed that rely on both precision and recall. For example, see F1 score

猜你喜欢

转载自www.cnblogs.com/pegasus923/p/10508444.html