Machine Learning 7: Logistic Regression

1. Description

        Logistic regression model is one of the most common machine learning models for handling classification problems. Binomial logistic regression is just one type of logistic regression model. It refers to the classification of two variables where probability is used to determine a binary outcome, hence the "bi" in "binomial". The result is true or false—0 or 1.

        An example of binomial logistic regression is predicting the likelihood of COVID-19 in a population. A person either has COVID-19 or does not, and a threshold must be established to differentiate between these results as accurately as possible.

2. sigmoid function

        These predictions do not fit a line, like linear regression models. Instead, a logistic regression model is fit to the sigmoid function shown on the right   .

        For each  x , the resulting  y  value represents the probability that the result is True. In the COVID-19 example, this represents a doctor's confidence that someone has the virus. In the image on the right, negative results are in blue and positive results are in red.

Image source: Author

3. Process

        To perform a binomial logistic regression we need to do various things:

  1. Create a training data set.
  2. Create our model using PyTorch.
  3. Fit our data into the model.

        The first step in a logistic regression problem is to create a training data set. First, we should set a seed to ensure repeatability of our random data.

import numpy as np
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
from torch.nn import Linear

torch.manual_seed(42)   # set a random seed

We have to use PyTorch's linear model because we are dealing with an input  x  and an output  y . Therefore, our model is linear. To do this we will use PyTorch's function:Linear

model = Linear(in_features=1, out_features=1) # use a linear model

Next, we have to generate the blue X and red X data, making sure to reshape them from row vectors to column vectors. The blue one is between 0 and 7, the red one is between 7 and 10. For  the y-  values, the blue dots represent negative COVID-19 tests, so they will all be

  1. For the red dots, they represent positive COVID-19 tests, so they will be 1. Below is the code and its output:
blue_x = (torch.rand(20) * 7).reshape(-1,1)   # random floats between 0 and 7
blue_y = torch.zeros(20).reshape(-1,1)

red_x = (torch.rand(20) * 7+3).reshape(-1,1)  # random floats between 3 and 10
red_y = torch.ones(20).reshape(-1,1)

X = torch.vstack([blue_x, red_x])   # matrix of x values
Y = torch.vstack([blue_y, red_y])   # matrix of y values

Our code should now look like this:

import numpy as np
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
from torch.nn import Linear

torch.manual_seed(42)   # set a random seed

model = Linear(in_features=1, out_features=1) # use a linear model

blue_x = (torch.rand(20) * 7).reshape(-1,1)   # random floats between 0 and 7
blue_y = torch.zeros(20).reshape(-1,1)

red_x = (torch.rand(20) * 7+3).reshape(-1,1)  # random floats between 3 and 10
red_y = torch.ones(20).reshape(-1,1)

X = torch.vstack([blue_x, red_x])   # matrix of x values
Y = torch.vstack([blue_y, red_y])   # matrix of y values

4. Optimization

        We will use the gradient descent process to optimize the loss of the sigmoid function. The loss is calculated based on how well the function fits the data, which is controlled by the slope and intercept of the S-shaped curve. We need gradient descent to find the optimal slope and intercept.

        We will also use Binary Cross Entropy (BCE) as our loss function, or logarithmic loss function. For general logistic regression, a loss function that does not include the logarithm will not work.

        To implement BCE as our loss function, we set it as our criterion and use stochastic gradient descent as our means of optimizing it. Since this is the function we will be optimizing, we need to pass in the model parameters and learning rate.

epochs = 2000   # run 2000 iterations
criterion = nn.BCELoss()    # implement binary cross entropy loss function

optimizer = torch.optim.SGD(model.parameters(), lr = .1) # stochastic gradient descent

        Now, we are ready to start gradient descent to optimize our loss. We have to zero out the gradient, find the y-hat  value by plugging our data into the sigmoid function  , calculate the loss, and find the gradient of the loss function. We then have to take a step and make sure our new slope is stored and intercepted for the next iteration.

optimizer.zero_grad()
Yhat = torch.sigmoid(model(X)) 
loss = criterion(Yhat,Y)
loss.backward()
optimizer.step() 

5. Closing

        To find the optimal slope and intercept, we are essentially training our model. We have to apply gradient descent over multiple iterations or epochs . In this example, we will use 2,000 epochs for demonstration.

epochs = 2000   # run 2000 iterations
criterion = nn.BCELoss()    # implement binary cross entropy loss function

optimizer = torch.optim.SGD(model.parameters(), lr = .1) # stochastic gradient descent

for i in range(epochs):
    optimizer.zero_grad()
    Yhat = torch.sigmoid(model(X))
    loss = criterion(Yhat,Y)
    loss.backward()
    optimizer.step()

    print(f"epoch: {i+1}")
    print(f"loss: {loss: .5f}")
    print(f"slope: {model.weight.item(): .5f}")
    print(f"intercept: {model.bias.item(): .5f}")
    print()

Putting all the code snippets together, we should get the following code:

import numpy as np
import matplotlib.pyplot as plt
import torch
import torch.nn as nn
from torch.nn import Linear

torch.manual_seed(42)   # set a random seed

model = Linear(in_features=1, out_features=1) # use a linear model

blue_x = (torch.rand(20) * 7).reshape(-1,1)   # random floats between 0 and 7
blue_y = torch.zeros(20).reshape(-1,1)

red_x = (torch.rand(20) * 7+3).reshape(-1,1)  # random floats between 3 and 10
red_y = torch.ones(20).reshape(-1,1)

X = torch.vstack([blue_x, red_x])   # matrix of x values
Y = torch.vstack([blue_y, red_y])   # matrix of y values

epochs = 2000   # run 2000 iterations
criterion = nn.BCELoss()    # implement binary cross entropy loss function

optimizer = torch.optim.SGD(model.parameters(), lr = .1) # stochastic gradient descent

for i in range(epochs):
    optimizer.zero_grad()
    Yhat = torch.sigmoid(model(X))
    loss = criterion(Yhat,Y)
    loss.backward()
    optimizer.step()

    print(f"epoch: {i+1}")
    print(f"loss: {loss: .5f}")
    print(f"slope: {model.weight.item(): .5f}")
    print(f"intercept: {model.bias.item(): .5f}")
    print()
两千个时期后的最终输出:

epoch: 2000
loss:  0.53861
slope:  0.61276
intercept: -3.17314

Final output after two thousand epochs:

epoch: 2000
loss:  0.53861
slope:  0.61276
intercept: -3.17314 

6. Visualization

        Finally, we can plot the data with the sigmoid function to obtain the following visualization:

x = np.arange(0,10,.1)
y = model.weight.item()*x + model.bias.item()

plt.plot(x, 1/(1 + np.exp(-y)), color="green")

plt.xlim(0,10)
plt.scatter(blue_x, blue_y, color="blue")
plt.scatter(red_x, red_y, color="red")

plt.show()

Image source: Author

7. Limitations

        One of the biggest problems with binary classification is the need for thresholds. In the case of logistic regression, this threshold should be  the x  value where  y  is 50%. The question we are trying to answer is where to place the threshold?

        In the case of COVID-19 testing, the original example illustrates this dilemma. If we set the threshold to x=5, we can clearly see the blue points that should be red and the red points that should be blue.

        The dangling red dots are called false positives , which are areas where the model incorrectly predicts positive classes. The dangling blue dots are called false negatives  - areas where the model incorrectly predicts a negative class.

 8. Conclusion

        A successful binomial logistic regression model will reduce the number of false negatives, as these often cause the greatest danger. Having COVID-19 but testing negative poses a serious risk to the health and safety of others.

        By using binomial logistic regression on the available data, we can determine the best places to place thresholds, helping to reduce uncertainty and make more informed decisions.

Guess you like

Origin blog.csdn.net/gongdiwudu/article/details/53140300