Multinomial logistic regression problem using Python

1. Description

        Multinomial logistic regression is a statistical method used to predict classification outcomes for more than two categories. It is particularly useful when the dependent variable is categorical rather than continuous.

2. Classification prediction

        In multinomial logistic regression , the model predicts the probability of an observation belonging to each category of the dependent variable. These probabilities can be interpreted as the likelihood that an observation belongs to each category. The predicted class is usually the class with the highest probability, making it a categorical prediction rather than a continuous prediction.

In contrast, standard logistic regression, also known as binary logistic regression ( a special case of binomial logistic regression )         , is used when the dependent variable has only two categories . It predicts the probability that an observation belongs to one category versus another. Predictions in binary logistic regression are 0continuous probabilities between and 1.

3. Lower level principle

        Here is the math behind multinomial logistic regression. These equations represent a set of log-linear models in which the logarithm of the ratio of the probabilities for each category is linearly $X_i$related to the predictor variable , via a coefficient (or slope) parameter, denoted $\beta_1$ by \beta_2,,, \beta_{K-1}. The symbols $\text{P}(Y_i = K)$  represent the probability that an observation $i$belongs to $K$a class, where $K$is an integer representation of each class, starting from 1.

        From these equations, we can simplify to the following equations, which are alternative forms of the equations we saw earlier. They arise from the fact that the probabilities of all K classes must sum to 1. In these equations, the numerator is used as a reference category and the probabilities of the other K – 1 categories are expressed exponentially with respect to this reference category as a function of the predictors Xi and the corresponding coefficients βk .

        The exponential function $e$ in these equations is used to convert the linear predictor (i.e., βk  •  Xi ) into a probability, which is always positive and between 0 and 1. By applying the exponential function e to the previous two-sided equation, we obtain the following equation of the first kind:

        These equations are typically used in multinomial logistic regression, where the goal is to predict the probability of an observation belonging to each of K categories based on one or more predictor variables.

4. About data

        In this example, we will use the UC Irvine abalone dataset to predict the sex of abalone. Multiple linear regression models can be used to predict age, but in this case we are predicting the sex of a given abalone based on several different characteristics.

        Using the Python  pandaspackage we can see the shape of the data, as well as the first few rows.

import pandas as pd df = pd.read_csv("https://raw.githubusercontent.com/s-lasch/CIS-280/main/abalone.csv") # read csv df.shape # get shape
(4177, 9)
df.head() # show first 5 rows
| sex | length | diameter | height | whole_weight | shucked_weight | viscera_weight | shell_weight | rings | 
------------------------------------------------------------------------------------------------------------ 
| M   | 0.455  | 0.365    | 0.095  | 0.5140       | 0.2245         | 0.1010         |0.150         | 15    |
| M   | 0.350  | 0.265    | 0.090  | 0.2255       | 0.0995         | 0.0485         | 0.070        | 7     |
| F   | 0.530  | 0.420    | 0.135  | 0.6770       | 0.2565         | 0.1415         | 0.210        | 9     |
| M   | 0.440  | 0.365    | 0.125  | 0.5160       | 0.2155         | 0.1140         | 0.155        | 10    |
| I   | 0.330  | 0.255    | 0.080  | 0.2050       | 0.0895         | 0.0395         | 0.055        | 7     |

5. Processing data

        We can see that there are three different categories in this column sex: M, F, and I, which represent males, females, and infants respectively. These represent the classes that our model will predict based on other columns in the dataset.

df['sex'].value_counts().sort_values(ascending=False) # count the number of distinct classes
M    1528
I    1342
F    1307
Name: sex, dtype: int64

        This means that our y data will be sexcolumns, and our X data will be all columns except sex.

X = df.drop(['sex'], axis=1) 
y = df['sex']

Now we are ready to split the data into training and testing. We can scikitlearndo this using a package like the following:

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.20, random_state = 5)

print(X_train.shape)
print(X_test.shape)
print(y_train.shape)
print(y_test.shape)
(3341, 8)
(836, 8)
(3341,)
(836,)

6. One-time coding

        Now that we have training and test data, it is important to remember that any regression model requires integer or floating point inputs. Since the sexcolumn is a categorical column, we need to apply integer encoding to use them for regression. To do this, we will apply one-hot encoding , which will assign an integer to each different class.

y_train = y_train.apply(lambda x: 0 if x == "M" else 1 if x == "F" else 2)
y_test = y_test.apply(lambda x: 0 if x == "M" else 1 if x == "F" else 2)

        Now that the y data is encoded, we must convert each train/test dataset to torch.tensor. This is crucial for any regression using Pytorch, as it can only take tensors.

X_train_tensor = torch.tensor(X_train.to_numpy()).float()
X_test_tensor = torch.tensor(X_test.to_numpy()).float()
y_train_tensor = torch.tensor(y_train.to_numpy()).long()
y_test_tensor = torch.tensor(y_test.to_numpy()).long()

For more information on tensors, a really in-depth resource can be found here. Steven Rush

7. The model

        After the data processing is complete, we can now start the process of creating the model. To implement this model, we will use the Pytorch library. Since there are 8 features used to determine gender, we need to set it to in_features8. Since the model can only predict three possible classes, it out_featureswill be set to 3. For more information about Pytorch torch.nnmodules, see the documentation .

import torch
import torch.nn as nn
from torch.nn import Linear
import torch.nn.functional as F


torch.manual_seed(348965)                                 # keep random values consistent

model = Linear(in_features=8, out_features=3)             # define the model

# define the loss function and optimizer
criterion = nn.CrossEntropyLoss()                         # use cross-entropy loss for multi-class classification
optimizer = torch.optim.SGD(model.parameters(), lr=.01)   # learning rate of 0.01, and Stocastic Gradient descent optimizer

8. Training model

num_epochs = 2500    # loop iterations

for epoch in range(num_epochs):
    # forward pass
    outputs = model(X_train_tensor)
    loss = criterion(outputs, y_train_tensor)

    # backward and optimize
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    # print progress every 100 epochs
    if (epoch+1) % 100 == 0:
        print('Epoch [{}/{}]\tLoss: {:.4f}'.format(epoch+1, num_epochs, loss.item()))
Epoch [100/2500]     Loss: 1.1178
Epoch [200/2500]     Loss: 1.1006
Epoch [300/2500]     Loss: 1.0850
Epoch [400/2500]     Loss: 1.0708
Epoch [500/2500]     Loss: 1.0579
Epoch [600/2500]     Loss: 1.0460
Epoch [700/2500]     Loss: 1.0352
Epoch [800/2500]     Loss: 1.0252
Epoch [900/2500]     Loss: 1.0161
Epoch [1000/2500]    Loss: 1.0077
Epoch [1100/2500]    Loss: 0.9999
Epoch [1200/2500]    Loss: 0.9927
Epoch [1300/2500]    Loss: 0.9860
Epoch [1400/2500]    Loss: 0.9799
Epoch [1500/2500]    Loss: 0.9741
Epoch [1600/2500]    Loss: 0.9688
Epoch [1700/2500]    Loss: 0.9638
Epoch [1800/2500]    Loss: 0.9592
Epoch [1900/2500]    Loss: 0.9549
Epoch [2000/2500]    Loss: 0.9509
Epoch [2100/2500]    Loss: 0.9471
Epoch [2200/2500]    Loss: 0.9435
Epoch [2300/2500]    Loss: 0.9402
Epoch [2400/2500]    Loss: 0.9371
Epoch [2500/2500]    Loss: 0.9342

9. Verification

Now to check the accuracy we can run the following code:

outputs = model(X_test_tensor)
_, preds = torch.max(outputs, dim=1)
accuracy = torch.mean((preds == y_test_tensor).float())
print('\nAccuracy: {:.2f}%'.format(accuracy.item()*100))
Accuracy: 52.63%

This means that our model accurately identified the sex of abalone based on 8 different features almost 53% of the time, which is not great.

10. Complete code

        Here is the complete code:

import torch
import torch.nn as nn
from torch.nn import Linear
import torch.nn.functional as F


torch.manual_seed(348965)                                 # keep random values consistent

model = Linear(in_features=8, out_features=3)             # define the model

# define the loss function and optimizer
criterion = nn.CrossEntropyLoss()                         # use cross-entropy loss for multi-class classification
optimizer = torch.optim.SGD(model.parameters(), lr=.01)   # learning rate of 0.01, and Stocastic Gradient descent optimizer

# convert the data to PyTorch tensors
X_train_tensor = torch.tensor(X_train.to_numpy()).float()
X_test_tensor = torch.tensor(X_test.to_numpy()).float()
y_train_tensor = torch.tensor(y_train.to_numpy()).long()
y_test_tensor = torch.tensor(y_test.to_numpy()).long()

# train the model
num_epochs = 2500    # loop iterations

for epoch in range(num_epochs):
    # forward pass
    outputs = model(X_train_tensor)
    loss = criterion(outputs, y_train_tensor)

    # backward and optimize
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    # print progress every 100 epochs
    if (epoch+1) % 100 == 0:
        print('Epoch [{}/{}]\tLoss: {:.4f}'.format(epoch+1, num_epochs, loss.item()))


outputs = model(X_test_tensor)
_, preds = torch.max(outputs, dim=1)
accuracy = torch.mean((preds == y_test_tensor).float())
print('\nAccuracy: {:.2f}%'.format(accuracy.item()*100))
Epoch [100/2500]    Loss: 1.1178
Epoch [200/2500]    Loss: 1.1006
Epoch [300/2500]    Loss: 1.0850
Epoch [400/2500]    Loss: 1.0708
Epoch [500/2500]    Loss: 1.0579
Epoch [600/2500]    Loss: 1.0460
Epoch [700/2500]    Loss: 1.0352
Epoch [800/2500]    Loss: 1.0252
Epoch [900/2500]    Loss: 1.0161
Epoch [1000/2500]    Loss: 1.0077
Epoch [1100/2500]    Loss: 0.9999
Epoch [1200/2500]    Loss: 0.9927
Epoch [1300/2500]    Loss: 0.9860
Epoch [1400/2500]    Loss: 0.9799
Epoch [1500/2500]    Loss: 0.9741
Epoch [1600/2500]    Loss: 0.9688
Epoch [1700/2500]    Loss: 0.9638
Epoch [1800/2500]    Loss: 0.9592
Epoch [1900/2500]    Loss: 0.9549
Epoch [2000/2500]    Loss: 0.9509
Epoch [2100/2500]    Loss: 0.9471
Epoch [2200/2500]    Loss: 0.9435
Epoch [2300/2500]    Loss: 0.9402
Epoch [2400/2500]    Loss: 0.9371
Epoch [2500/2500]    Loss: 0.9342

Originally published at https://s-lasch.github.io on May 1, 2023 .

Guess you like

Origin blog.csdn.net/gongdiwudu/article/details/133420378