Lecture6 Logistic Regression (Logistic Regression)

Table of contents

1 Common Datasets

1.1 MNIST dataset

1.2 CIFAR-10 Dataset

2 Class content

2.1 The difference between regression tasks and classification tasks

2.2 Why Logistic Regression is used

2.3 What is logistic regression

2.4 The concept of Sigmoid function and saturation function

2.5 Logistic regression model

2.6 Logistic regression loss function

2.6.1 Binary classification loss function

2.6.2 Mini-batch binary classification loss function

3 code implementation


1 Common Datasets

1.1 MNIST dataset

MNIST is a dataset of images of handwritten digits that is primarily used for training and testing machine learning models. It consists of 60,000 training images and 10,000 test images, each image is a 28x28 pixel grayscale image representing a handwritten digit. The MNIST dataset has become one of the benchmark datasets for many machine learning algorithms, especially for image classification tasks and digit recognition tasks.

Figure 1 MNIST dataset

download method

import torchvision
train_set = torchvision.datasets.MNIST(root='../dataset/mnist', train=True,  download=True)
test_set  = torchvision.datasets.MNIST(root='../dataset/mnist', train=False, download=True)

This code includes two parameters: root and train, and download is an optional parameter. root specifies the root directory for downloading the dataset, train specifies the type of dataset to be loaded, train=True means loading the training set, train=False means loading the test set. download=True means to download the data set if it does not exist locally, and finally save the training set and test set in the two variables of train_set and test_set respectively.

1.2 CIFAR-10 Dataset

Figure 2 CIFAR-10 dataset

 CIFAR-10 is a commonly used image classification dataset consisting of 60,000 32x32 color images of 10 different categories. These categories include: Airplanes, Cars, Birds, Cats, Deer, Dogs, Frogs, Horses, Boats, and Trucks. The dataset is divided into a training set and a test set, where the training set contains 50,000 images and the test set contains 10,000 images. The CIFAR-10 dataset is widely used in computer vision and deep learning research fields, especially for the development and testing of image classification algorithms.

import torchvision
train_set = torchvision.datasets.CIFAR10(…)
test_set  = torchvision.datasets.CIFAR10(…)

2 Class content

2.1 The difference between regression tasks and classification tasks

  Regression tasks and classification tasks are two important types of tasks in machine learning, the main difference of which is the difference in prediction goals. The goal of classification tasks is to classify input data into predefined categories. For example, in the handwritten digit classification task, the goal is to classify handwritten digit images into one of the ten digits from 0-9. The goal of regression tasks is to predict continuous values. For example, in the house price prediction task, the goal is to predict the selling price of a house, which is a continuous value.

  Therefore, the main difference between regression tasks and classification tasks is the type of prediction target: the goal of classification tasks is to predict a discrete category, while the goal of regression tasks is to predict a continuous value.

2.2 Why Logistic Regression is used

For classification problems, logistic regression   is used because the values ​​are discrete . In the binary classification problem, we need to obtain the result that is the category to which the input data belongs. For example, in the figure below, the result is either passed or not passed, which is a typical binary classification problem.

Figure 3 Two classification problems

  The binary classification problem is one of the most common problems in machine learning and has a wide range of applications, such as judging whether an email is spam, judging whether a person has a certain disease, etc.

2.3 What is logistic regression

  In the binary classification problem, the method of measuring the category of the input data after operation is generally expressed by probability. We get the probability value of the corresponding category. Whichever probability value is larger, we think it belongs to which category.

  Logistic regression is a linear classification model for binary classification problems. Its basic idea is to model the relationship between the input feature and the corresponding category as a linear function, and map it to the interval [0,1] through the Sigmoid function to obtain the probability corresponding to the positive example, So it can be said that the function of the sigmoid function is to map the data to the [0,1] interval.

2.4 The concept of Sigmoid function and saturation function

  In mathematics, a saturated function is a type of function whose output value tends to a finite upper and lower limit when the input value is close to positive or negative infinity. The saturation function is usually used in the activation function of the neural network, such as the Sigmoid function

  When the absolute value of the input value x is large, the output value of the Sigmoid function will approach 0 or 1, so it is called a "saturation function".

Figure 4 Sigmoid function

Figure 5 Sigmoid function image

  When the input of the Sigmoid function is negative infinity, the output is 0, and when the input is positive infinity, the output is 1, so the prediction result can be interpreted as a probability value and threshold classification can be performed.

  In the neural network, the saturation function has smooth and derivable properties as an activation function, and can limit the output value within a certain range, making the training of the neural network more stable. However, since saturated functions may have the problem of gradient disappearance during gradient calculation, in deep neural networks, some unsaturated functions are more often used as activation functions, such as the ReLU function.

Common Sigmoid functions are:

Figure 6 Common Sigmoid function

2.5 Logistic regression model

The change of the model after adding the logistic function compared to the ordinary linear regression model:

Figure 7 Model difference between linear regression and logistic regression

2.6 Logistic regression loss function

2.6.1 Binary classification cross entropy loss function

  The binary cross entropy loss function (Binary Cross Entropy, BCE) has also undergone some changes compared with the loss function of linear regression, mainly because of the introduction of the concept of cross entropy (Cross Entropy) . Cross-entropy is a measure of the difference between a set of true labels and a set of predicted labels given the two sets of labels.

  Binary classification cross-entropy refers to the cross-entropy loss function designed specifically for binary classification tasks. For binary classification problems, the binary cross-entropy loss function can be expressed as:

Figure 8 Two classification loss function

  Among them, y represents the real label of the sample (0 or 1), and \hat{y}represents the predicted label of the sample (the value range is [0, 1]).

  The basic idea of ​​the cross-entropy loss function is: compare the probability distribution of the real label and the predicted label, and calculate the gap between them, which is used to measure the similarity between the predicted label and the real label. When the predicted label is more similar to the real label, the value of the loss function is smaller, and vice versa, the value of the loss function is larger.

  For binary classification problems, the cross-entropy loss function can be interpreted as: if the real label of the sample is 1, then we hope that the \hat{y}closer the value of the model prediction label is to 1, the better; if the real label of the sample is 0, then we hope that the model predicts \hat{y}The closer the value of the label is to 0, the better. Therefore, the cross-entropy loss function can be used as an important indicator to measure the prediction effect of the binary classification model.

What is the difference between BCE and MSE?

  BCE is mainly used in binary classification problems. Its loss function has a simple form and can directly measure the gap between the probability value predicted by the model for each sample and the real label. MSE is more suitable for regression problems, which can measure the gap between the value predicted by the model and the true value for each sample. 

2.6.2 Mini-batch binary classification loss function

  In practical applications, we usually need to train on large-scale datasets. If the full amount of data is used to calculate the gradient, it will take up too much memory and computing resources, resulting in slow training or failure to complete the training. With BCELoss in small batches, the data set can be read into the memory in batches, and the loss function can be calculated one by one in small batches, and then the gradient can be calculated, thereby speeding up the training speed and effectively reducing memory usage and computational complexity.

Figure 9 Mini-batch binary classification loss function

Figure 10 Application of mini-batch binary classification loss function

  If the predicted value of the model is closer to the true value, the loss of cross entropy is smaller. Therefore, by minimizing the cross-entropy loss, we can train a model that can accurately classify the data.

3 code implementation

1. This code defines a logistic regression model class LogisticRegressionModel, which inherits the Module class. The model uses a single linear layer (Linear) to predict the output for an input x.

class LogisticRegressionModel(torch.nn.Module):
    def __init__(self):
        super(LogisticRegressionModel, self).__init__()
        self.linear = torch.nn.Linear(1, 1)

    def forward(self, x):
        y_pred = F.sigmoid(self.linear(x)) # sigmoid 函数将输入值压缩到 [0,1] 的区间内,可以将线性层的输出转换为概率值,用于二分类问题的预测。
        return y_pred

2. Define a binary cross-entropy loss function object and assign it to the variable criterion:

criterion = torch.nn.BCELoss (size_average=False)

In PyTorch, torch.nn.BCELoss is the implementation of the binary cross-entropy loss function, which is used to measure the difference between the model output and the real label, and its return value is the loss value of the model. The size_average parameter indicates whether to average the loss value of each batch. The default is True. If it is set to False, the average is not calculated, and the sum of each batch is returned. Because we generally use the mini-batch gradient descent method when training neural networks, we need to average the loss values ​​of each mini-batch.

output image

import numpy as np
import matplotlib.pyplot as plt

'''这里使用NumPy的linspace函数在0到10之间生成了200个等间距的数字,
并将其转换为PyTorch张量,方便后续计算。'''
x = np.linspace(0, 10, 200)
x_t = torch.Tensor(x).view((200, 1))

y_t = model(x_t)

'''这里使用PyTorch张量的data属性将其转换为NumPy数组,
并使用matplotlib库的plot函数绘制出曲线。'''
y = y_t.data.numpy()
plt.plot(x, y)

plt.plot([0, 10], [0.5, 0.5], c='r') # 这里绘制了红色的水平分界线,表示y值为0.5时的x轴取值范围
plt.xlabel('Hours')
plt.ylabel('Probability of Pass')
plt.grid()
plt.show()

full code

x_data = torch.Tensor([[1.0], [2.0], [3.0]])
y_data = torch.Tensor([[0], [0], [1]])
#-------------------------------------------------------#
class LogisticRegressionModel(torch.nn.Module):
def __init__(self):
super(LogisticRegressionModel, self).__init__()
self.linear = torch.nn.Linear(1, 1)
def forward(self, x):
y_pred = F.sigmoid(self.linear(x))
return y_pred
model = LogisticRegressionModel()
#-------------------------------------------------------#
criterion = torch.nn.BCELoss(size_average=False)
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
#-------------------------------------------------------#
for epoch in range(1000):
y_pred = model(x_data)
loss = criterion(y_pred, y_data)
print(epoch, loss.item())
optimizer.zero_grad()
loss.backward()
optimizer.step()

The overall process of using logistic regression to deal with binary classification problems:

Figure 11 Flowchart

Official document link: https://pytorch.org/docs/stable/nn.html?highlight=bceloss#torch.nn.BCELoss

Guess you like

Origin blog.csdn.net/m0_56494923/article/details/129144895