Logistic regression learning

3.1 Introduction to logistic regression

learning target

  • Understand the application scenarios of logistic regression
  • Know the principles of logistic regression
  • Master the loss function and optimization plan of logistic regression

Logistic Regression is a classification model in machine learning . Logistic regression is a classification algorithm, although it has regression in its name . Due to the simplicity and efficiency of the algorithm, it is widely used in practice.

1 Application scenarios of logistic regression

  • ad click rate
  • Is it spam?
  • Are you sick?
  • financial fraud
  • fake account

Looking at the above examples, we can find the characteristic, that is, they all belong to the judgment between two categories. Logistic regression is a powerful tool for solving binary classification problems

2 Principles of logistic regression

To master logistic regression, you must master two things:

  • In logistic regression, what are the input values?
  • How to judge the output of logistic regression

2.1 Input

The input of logistic regression is the result of linear regression.

2.2 Activation function

  • sigmoid function
  • 2
  • Judgment criteria
    • The regression results are input into the sigmoid function
    • Output result: a probability value in the interval [0, 1], the default is 0.5 as the threshold

image-20221114152645861

The final classification of logistic regression is to judge whether it belongs to a certain category through the probability value of belonging to a certain category, and this category is marked as 1 (positive example) by default, and the other category will be marked as 0 (negative example). (Convenient for loss calculation)

Explanation of the output results (important): Suppose there are two categories A and B, and assume that our probability value is the probability value belonging to the category A(1). Now there is a sample input to the logistic regression output result 0.55, then this probability value exceeds 0.5, which means that the result of our training or prediction is the A(1) category. Then on the contrary, if the result is 0.3, then the training or prediction result will be the B(0) category.

The threshold of logistic regression can be changed . For example, in the above example, if you set the threshold to 0.6, then the output result is 0.55, which belongs to category B.

Previously, we used the least squares method to measure the loss of linear regression

In logistic regression, when the prediction result is wrong, how do we measure the loss?

Let’s look at the picture below (in the picture below, set the threshold to 0.6),

image-20221114152719727

So how to measure the difference between the predicted results of logistic regression and the real results?

3 Loss and optimization

3.1 Losses

The loss of logistic regression is called log-likelihood loss , and the formula is as follows:

  • Separate categories:

image-20221114152742195

where y is the real value, h θ ( x ) h_\theta(x)hi( x ) is the predicted value

How to understand a single expression? This should be understood based on the function image of log

image-20221114152756953

Whenever we want the loss function value, the smaller the better

Discuss on a case-by-case basis, and the corresponding loss function values:

  • When y=1, we want h θ ( x ) h_\theta(x)hiThe larger the ( x ) value, the better;

  • When y=0, we want h θ ( x ) h_\theta(x)hiThe smaller the ( x ) value, the better

  • Comprehensive complete loss function

image-20221114152822546

Next, we will take the above example and calculate it again, so that we can understand the meaning.

image-20221114152842220

We already know that -log§, the larger the P value, the smaller the result, so we can analyze this loss formula

3.2 Optimization

Also use the gradient descent optimization algorithm to reduce the value of the loss function. In this way, the weight parameters of the corresponding algorithm in front of the logistic regression are updated, increasing the probability of originally belonging to category 1 and reducing the probability of originally belonging to category 0.


4 Summary

  • Logistic regression concept [know]
    • It solves a two-classification problem
    • The input of logistic regression is the output of linear regression
  • Principles of Logistic Regression [Master]
    • enter:
      • Linear regression output
    • activation function
      • sigmoid function
      • Map the overall value to [0,1]
      • Set a threshold again to make classification judgments
  • Loss and Optimization of Logistic Regression [Master]
    • loss
      • Log-likelihood loss
      • Completed with the help of log idea
      • The true value is divided into two cases: 0 and 1.
    • optimization
      • Increase the probability of originally belonging to category 1 and reduce the probability of originally belonging to category 0.

3.2 Introduction to logistic regression API

learning target

  • Know how to use the logistic regression API

  • sklearn.linear_model.LogisticRegression(solver=‘liblinear’, penalty=‘l2’, C = 1.0)
    • Solver optional parameters: {'liblinear', 'sag', 'saga', 'newton-cg', 'lbfgs'},
    • penalty: type of regularization
    • C: Regularization strength

3.3 Case: Cancer Classification Prediction-Benign/Malignant Breast Cancer Tumor Prediction

learning target

  • Through the tumor prediction case, learn how to use logistic regression to train the model

1 Background introduction

Download address of original data: https://archive.ics.uci.edu/ml/machine-learning-databases/

Data description

(1) 699 samples, a total of 11 columns of data, the first column is the id retrieved by the term, and the last 9 columns are related to the tumor

Relevant medical characteristics, the last column represents the numerical value of the tumor type. Benign is represented by 2, malignant is represented by 4

(2) Contains 16 missing values, marked with "?"

2 Case analysis

1.获取数据
2.基本数据处理
2.1 缺失值处理
2.2 确定特征值,目标值
2.3 分割数据
3.特征工程(标准化)
4.机器学习(逻辑回归)
5.模型评估

3 code implementation

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression

# 1.获取数据
names = ['Sample code number', 'Clump Thickness', 'Uniformity of Cell Size', 'Uniformity of Cell Shape',
                   'Marginal Adhesion', 'Single Epithelial Cell Size', 'Bare Nuclei', 'Bland Chromatin',
                   'Normal Nucleoli', 'Mitoses', 'Class']

data = pd.read_csv("data/breast-cancer-wisconsin.data",names=names)
data.head()
# 2.基本数据处理
# 2.1 缺失值处理
data = data.replace(to_replace="?", value=np.NaN)
data = data.dropna()
# 2.2 确定特征值,目标值
x = data.iloc[:, 1:10]
x.head()
y = data["Class"]
y.head()
# 2.3 分割数据
x_train, x_test, y_train, y_test = train_test_split(x, y, random_state=22)
# 3.特征工程(标准化)
transfer = StandardScaler()
x_train = transfer.fit_transform(x_train)
x_test = transfer.transform(x_test)
# 4.机器学习(逻辑回归)
estimator = LogisticRegression()
estimator.fit(x_train, y_train)

#THRESHOLD = 0.25
#y_predict = np.where(estimator.predict_proba(X_test)[:,1] > THRESHOLD, 1, 0)
# 5.模型评估
y_predict = estimator.predict(x_test)
print("预测值为:\n", y_predict)
result = estimator.score(x_test, y_test)
print("正确率为:\n", result)

In many classification scenarios, we do not necessarily only focus on the accuracy of prediction! ! ! ! !

Take this cancer as an example! ! ! We do not focus on the accuracy of prediction, but on whether all cancer patients have been predicted (detected) among all samples.


4 Summary

  • Tumor prediction case implementation [Know]
    • If there are missing values ​​in the data, be sure to deal with them
    • Accuracy is not the only criterion for measuring correct classification

3.4 Classification evaluation method

learning target

  • Understand what a confusion matrix is
  • Know precision and recall in classification evaluation
  • Know the roc curve and auc indicator

1. Classification evaluation method

1.1 Precision rate and recall rate

1.1.1 Confusion matrix

Under the classification task, there are four different combinations between the predicted result (Predicted Condition) and the correct label (True Condition), forming a confusion matrix (suitable for multi-classification)

image-20221114153032865

1.1.2 Precision and Recall
  • Precision rate: the proportion of samples whose predicted results are positive examples that are actually positive examples:
  • Precision = TP/(TP+FP)

image-20221114153051222

  • Recall rate: the proportion of samples whose predicted results are positive among the samples that are actually positive (full search, ability to distinguish positive samples)
  • Recall = TP/(TP+FN)

image-20221114153110066

1.2 F1-score

There are other evaluation criteria, F1-score, which reflects the robustness of the model.

image-20221114153128099


1.3 Classification evaluation report API

  • sklearn.metrics.classification_report(y_true, y_pred, labels=[], target_names=None )
    • y_true: true target value
    • y_pred: The estimator predicts the target value
    • labels: Number corresponding to the specified category
    • target_names: target category names
    • return: precision and recall for each category
ret = classification_report(y_test, y_predict, labels=(2,4), target_names=("良性", "恶性"))
print(ret)

Assume such a situation, if 99 samples are cancer and 1 sample is non-cancer, no matter what, I predict all positive cases (default cancer is positive), the accuracy rate will be 99%, but the effect is not good, this is sample imbalance Assessment questions under

Question: How to measure evaluation under sample imbalance ?

2 ROC curve and AUC index

2.1 TPR and FPR

  • TPR = TP / (TP + FN): ( True Positive Rate ( TPR )): (another name for recall)
    • The proportion of predicted class 1 among all samples with true class 1
  • FPR = FP / (FP + TN):(False Positive Rate (FPR) )
    • The proportion of predicted class 1 among all samples with true class 0

2.2 ROC curve

3

  • The ROC curve plots TPR and FPR under different classification thresholds. Lowering the classification threshold will classify more items as positive, thus increasing FP and TP values. The figure below shows a typical ROC curve.
  • The horizontal axis of the ROC curve is FPRate, and the vertical axis is TPRate. When the two are equal, the meaning is: for a sample regardless of whether the true category is 1 or 0, the probability of the classifier predicting 1 is equal. At this time, the AUC is 0.5

ROC

2.3 AUC indicator

  • The probability meaning of AUC is the probability that a pair of positive and negative samples are randomly selected and the score of the positive sample is greater than the score of the negative sample.
  • The range of AUC is between [0, 1], and the closer to 1, the better. The closer to 0.5, it is a random guess.
  • The AUC value can measure the quality of a model.

2.4 AUC calculation API

  • from sklearn.metrics import roc_auc_score
    • sklearn.metrics.roc_auc_score(y_true, y_score)
      • Calculate the ROC curve area, that is, the AUC value
      • y_true: The true category of each sample.
      • y_score: prediction score, which can be the estimated probability of the positive class, the confidence value, or the return value of the classifier method
# y_test = np.where(y_test > 2.5, 1, 0)
print("AUC指标:",roc_auc_score(y_test, y_predict))
  • AUC can only be used to evaluate two categories
  • AUC is very suitable for evaluating classifier performance in sample imbalance

3 Summary

  • Confusion Matrix [Understand]
    • True Example (TP)
    • False Counterexample (FN)
    • False positive (FP)
    • True counterexample (TN)
  • Precision and Recall [Know]
    • Accuracy: (correct or not)
      • (TP+TN)/(TP+TN+FN+FP)
    • Accuracy – whether the check is accurate or not
      • TP/(TP+FP)
    • Recall rate – not all of them are checked
      • TP/(TP+FN)
    • F1-score
      • Reflect the robustness of the model
  • roc curve and auc indicator [know]
    • roc curve
      • Draw graphics through tpr and fpr, and then draw them into an indicator auc
    • auc
      • The closer it is to 1, the better the effect
      • The closer it is to 0.5, the effect is nonsense
    • Notice:
      • This indicator is mainly used to evaluate unbalanced binary classification problems.

3.5 Drawing of ROC curve

learning target

  • Know how to draw the ROC curve

Regarding the drawing process of ROC curve, the following example is used to illustrate

Assume that there are 6 display records, and two of them are clicked, and a display sequence (1:1,2:0,3:1,4:0,5:0,6:0) is obtained. The first one represents the sequence number, and the last one represents the sequence number. represents a click (1) or no click (0).

Then during these 6 displays, the probability sequence of clicks was calculated through the model.

Let’s look at three situations.

1 Curve drawing

1.1 If the sequence of probabilities is (1:0.9,2:0.7,3:0.8,4:0.6,5:0.5,6:0.4).

Together with the original sequence, the sequence is obtained (ranked from high probability to low)

1 1 0 0 0 0
0.9 0.8 0.7 0.6 0.5 0.4

The steps for drawing are:

1) Sort the probability sequence from high to low to get the order (1:0.9,3:0.8,2:0.7,4:0.6,5:0.5,6:0.4);

2) Pick a point starting from the maximum probability as the positive class, pick point 1, and calculate TPR=0.5, FPR=0.0;

3) Starting from the maximum probability, take another point as the positive class, take point 3, and calculate TPR=1.0, FPR=0.0;

4) Then take a point from the maximum as the positive class, take point 2, and calculate TPR=1.0, FPR=0.25;

5) By analogy, we get 6 pairs of TPR and FPR.

Then these 6 pairs of data are formed into 6 points (0,0.5), (0,1.0), (0.25,1), (0.5,1), (0.75,1), (1.0,1.0).

These 6 points can be plotted in a two-dimensional coordinate system.

image-20190406170931355

Look at the picture, that is the ROC curve.

1.2 If the sequence of probabilities is (1:0.9,2:0.8,3:0.7,4:0.6,5:0.5,6:0.4)

Together with the original sequence, the sequence is obtained (ranked from high probability to low)

1 0 1 0 0 0
0.9 0.8 0.7 0.6 0.5 0.4

The steps for drawing are:

6) Sort the probability sequence from high to low to get the order (1:0.9,2:0.8,3:0.7,4:0.6,5:0.5,6:0.4);

7) Pick a point starting from the maximum probability as the positive class, pick point 1, and calculate TPR=0.5, FPR=0.0;

8) Starting from the maximum probability, take another point as the positive class, take point 2, and calculate TPR=0.5, FPR=0.25;

9) Then take a point from the maximum as the positive class, take point 3, and calculate TPR=1.0, FPR=0.25;

10) By analogy, we get 6 pairs of TPR and FPR.

Then these 6 pairs of data are formed into 6 points (0,0.5), (0.25,0.5), (0.25,1), (0.5,1), (0.75,1), (1.0,1.0).

These 6 points can be plotted in a two-dimensional coordinate system.

image-20190406171018456

Look at the picture, that is the ROC curve.

1.3 If the sequence of probabilities is (1:0.4,2:0.6,3:0.5,4:0.7,5:0.8,6:0.9)

Together with the original sequence, the sequence is obtained (ranked from high probability to low)

0 0 0 0 1 1
0.9 0.8 0.7 0.6 0.5 0.4

The steps for drawing are:

11) Sort the probability sequence from high to low to get the order (6:0.9,5:0.8,4:0.7,2:0.6,3:0.5,1:0.4);

12) Pick a point starting from the maximum probability as the positive class, take point 6, and calculate TPR=0.0, FPR=0.25;

13) Starting from the maximum probability, take another point as the positive class, take point 5, and calculate TPR=0.0, FPR=0.5;

14) Then take a point from the maximum as the positive class, take point 4, and calculate TPR=0.0, FPR=0.75;

15) By analogy, we get 6 pairs of TPR and FPR.

Then these 6 pairs of data are formed into 6 points (0.25,0.0), (0.5,0.0), (0.75,0.0), (1.0,0.0), (1.0,0.5), (1.0,1.0).

These 6 points can be plotted in a two-dimensional coordinate system.

image-20190406171135154

Look at the picture, that is the ROC curve.

2 Explanation of meaning

As shown in the example above, there are a total of 6 points, 2 positive samples, and 4 negative samples. There are a total of 8 situations in which one positive sample and one negative sample can be taken.

In the first case above, taken from top to bottom, no matter how it is taken, the probability of positive samples is always higher than negative samples, so the probability of pairing is 1, AUC=1. Look at that ROC curve again, what is its integral? Also 1, the integral of the ROC curve is equal to the AUC.

In the second case above, if samples 2 and 3 are taken, the classification is wrong, and in other cases, the classification is correct; so the probability of classification is 0.875, AUC=0.875. Looking at the ROC curve again, its integral is also 0.875, and the integral of the ROC curve is equal to the AUC.

In the third case above, no matter how you choose it, the classification is wrong, so the probability of classification is 0, AUC=0.0. Looking at the ROC curve again, its integral is also 0.0, and the integral of the ROC curve is equal to AUC.

It’s awesome. In fact, AUC means Area Under Roc Curve, which is the integral of the ROC curve and the area under the ROC curve.

The significance of drawing the ROC curve is obvious. It is to continuously deduct possible misclassification cases. From the point with the highest probability downward, every negative sample will lead to misclassification of all positive samples below it, so The number of positive samples below it must be deducted (1-TPR, the proportion of remaining positive samples). After the overall ROC curve is drawn, the AUC is determined, and the probability of pairing can also be calculated.


3 Summary

  • Drawing of ROC curve [Know]
    • 1. Build a model and sort the probability values ​​of the model from large to small.
    • 2. Start taking the value from the point with the highest probability, continue to calculate tpr and fpr, and then build the overall model to get the result
      3, which means the classification is wrong. In other cases, the classification is correct; so the probability of classification is 0.875, AUC =0.875. Looking at the ROC curve again, its integral is also 0.875, and the integral of the ROC curve is equal to the AUC.

In the third case above, no matter how you choose it, the classification is wrong, so the probability of classification is 0, AUC=0.0. Looking at the ROC curve again, its integral is also 0.0, and the integral of the ROC curve is equal to AUC.

It’s awesome. In fact, AUC means Area Under Roc Curve, which is the integral of the ROC curve and the area under the ROC curve.

The significance of drawing the ROC curve is obvious. It is to continuously deduct possible misclassification cases. From the point with the highest probability downward, every negative sample will lead to misclassification of all positive samples below it, so The number of positive samples below it must be deducted (1-TPR, the proportion of remaining positive samples). After the overall ROC curve is drawn, the AUC is determined, and the probability of pairing can also be calculated.


3 Summary

  • Drawing of ROC curve [Know]
    • 1. Build a model and sort the probability values ​​of the model from large to small.
    • 2. Start taking the value from the point with the highest probability, continue to calculate tpr and fpr, and then build the overall model to get the result.
    • 3. In fact, it is solving the integral (area)

Guess you like

Origin blog.csdn.net/weixin_52733693/article/details/127848729