Fast food classification using transfer learning in Pytorch

introduce

Fast food sorting has become an important task in automated food delivery systems. Machine learning became popular with the growth of fast food chains and the need for accurate and efficient food identification systems.

In this blog, we will explore transfer learning for fast food classification using PyTorch. Transfer learning is a technique that utilizes pretrained models to solve new tasks with limited data.

db11c0138f96cd9c909e909147f7e575.png

We will discuss how to fine-tune a pre-trained model for fast food classification and the results obtained from this method.

learning target

  • Understanding PyTorch for Deep Learning

  • How to use transfer learning in PyTorch?

  • data augmentation

  • visual model

Table of contents

  1. What is transfer learning?

  2. the data shows

  3. Encoding implementation

What is transfer learning?

Transfer learning is a technique that utilizes pre-trained weights of deep learning models to perform new tasks with limited data.

In the context of ResNet18 (which I will be using in this project), transfer learning would involve taking a pretrained ResNet18 model and fine-tuning its weights for the specific fast food classification task. This approach aims to leverage the knowledge learned by pre-trained models on large datasets to solve new tasks with less data and computing resources. The fine-tuning process usually involves retraining the last few layers of the ResNet18 model to adapt it to the new task.

Below is the resnet18 model diagram.

1b687811fd26845a6ae5bbb32b8cdc4b.png

You can see that the model consists of 17 convolutional layers with 3*3 filters and a fully connected layer. Finally there is the Softmax function for multi-class image classification.

the data shows

Dataset: https://www.kaggle.com/datasets/utkarshsaxenadn/fast-food-classification-dataset

There are 10 categories of fast food images.

  • Burger

  • Donut

  • Hot Dog

  • Pizza

  • Sandwich

  • Baked Potato

  • Crispy Chicken

  • Fries

  • Taco

  • taquito

Encoding implementation

Step 1: Import all necessary libraries

from __future__ import print_function, division

import torch
import torch.nn as nn
import torch.optim as optim
from torch.optim import lr_scheduler
import torch.backends.cudnn as cudnn
import numpy as np
import torchvision
from torchvision import datasets, models, transforms
import matplotlib.pyplot as plt
import time
import os
import copy

Step 2: Set Paths to Datasets and Devices

PATH = "../data/Fast Food Classification V2/"

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

# make sure my GPU is detected.
print(device)

Step 3: Data Augmentation and Normalization

Data augmentation is a key technique used in deep learning to increase the size of training datasets and prevent overfitting. It can help improve the performance and robustness of deep learning models, especially in data-limited scenarios.

data_transforms = {
    'Train': transforms.Compose([
        transforms.RandomResizedCrop(224),
        transforms.RandomHorizontalFlip(),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
    'Valid': transforms.Compose([
        transforms.Resize(256),
        transforms.CenterCrop(224),
        transforms.ToTensor(),
        transforms.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ]),
}

Step 4: Load Dataset and create Dataloader object

image_datasets = {
    x: datasets.ImageFolder(os.path.join(PATH, x),
                            data_transforms[x]) for x in ['Train', 'Valid']
}
dataloaders = {
    x: torch.utils.data.DataLoader(image_datasets[x], 
                                   batch_size=32,
                                   shuffle=True, 
                                ) for x in ['Train', 'Valid']
}
dataset_sizes = {x: len(image_datasets[x]) for x in ['Train', 'Valid']}
class_names = image_datasets['Train'].classes
print(classes)

>>>
['Baked Potato',
 'Burger',
 'Crispy Chicken',
 'Donut',
 'Fries',
 'Hot Dog',
 'Pizza',
 'Sandwich',
 'Taco',
 'Taquito']

Let's look at some training data.

# create a function image show

def imshow(inp, title=None):
    inp = inp.numpy().transpose((1, 2, 0))
    mean = np.array([0.485, 0.456, 0.406])
    std = np.array([0.229, 0.224, 0.225])
    inp = std * inp + mean
    inp = np.clip(inp, 0, 1)
    plt.imshow(inp)
    if title is not None:
        plt.title(title)
    plt.pause(0.001) 
# Get a batch of training data
inputs, classes = next(iter(dataloaders['Train']))

# Make a grid from batch
out = torchvision.utils.make_grid(inputs)

imshow(out)
2e998d17325a83f587f0db4494e3b518.png

Step 5: Create the training function

The function takes the following inputs:

  1. Model : The deep learning model to train.

  2. Criterion : The loss function used to evaluate the performance of the model.

  3. Optimizer : The optimization algorithm updates the parameters of the model during training.

  4. Scheduler : The learning rate scheduler, used to adjust the learning rate during training.

  5. num_epochs : Number of training epochs (default = 25).

The function trains the model for num_epochs epochs, alternating between training and validation phases. At each epoch, the parameters of the model are updated using the optimizer and the criteria for calculating the loss.

During the training phase, gradients are calculated using backward() and parameters are updated using optimizer.step() . Evaluate the performance of the model during the validation phase without updating parameters.

After each epoch, performance metrics (loss and accuracy) are printed. Use copy.deepcopy() to save the best model weights (with highest validation accuracy).

At the end of training, print the elapsed time and best validation accuracy, and load the best model weights using model.load_state_dict(). Finally return the trained model.

def train_model(model, criterion, optimizer, scheduler, num_epochs=25):
    since = time.time()

    best_model_wts = copy.deepcopy(model.state_dict())
    best_acc = 0.0

    for epoch in range(num_epochs):
        print(f'Epoch {epoch}/{num_epochs - 1}')
        print('-' * 10)

        # Each epoch has a training and validation phase
        for phase in ['Train', 'Valid']:
            if phase == 'Train':
                model.train()  # Set model to training mode
            else:
                model.eval()   # Set model to evaluate mode

            running_loss = 0.0
            running_corrects = 0

            # Iterate over data.
            for inputs, labels in dataloaders[phase]:
                inputs = inputs.to(device)
                labels = labels.to(device)

                # zero the parameter gradients
                optimizer.zero_grad()

                # forward
                # track history if only in train
                with torch.set_grad_enabled(phase == 'Train'):
                    outputs = model(inputs)
                    _, preds = torch.max(outputs, 1)
                    loss = criterion(outputs, labels)

                    # backward + optimize only if in training phase
                    if phase == 'Train':
                        loss.backward()
                        optimizer.step()

                # statistics
                running_loss += loss.item() * inputs.size(0)
                running_corrects += torch.sum(preds == labels.data)
            if phase == 'Train':
                scheduler.step()

            epoch_loss = running_loss / dataset_sizes[phase]
            epoch_acc = running_corrects.double() / dataset_sizes[phase]

            print(f'{phase} Loss: {epoch_loss:.4f} Acc: {epoch_acc:.4f}')

            # deep copy the model
            if phase == 'Valid' and epoch_acc > best_acc:
                best_acc = epoch_acc
                best_model_wts = copy.deepcopy(model.state_dict())

        print()

    time_elapsed = time.time() - since
    print(f'Training complete in {time_elapsed // 60:.0f}m {time_elapsed % 60:.0f}s')
    print(f'Best Valid Acc: {best_acc:4f}')

    # load best model weights
    model.load_state_dict(best_model_wts)
    return model

Step 6: Start training the model with Resnet18 weights

model_1 = models.resnet18(pretrained=True)
num_ftrs = model_1.fc.in_features
# Here the size of each output sample is set to 2.
# Alternatively, it can be generalized to nn.Linear(num_ftrs, len(class_names)).
model_1.fc = nn.Linear(num_ftrs, len(class_names))

model_1 = model_1.to(device)

criterion = nn.CrossEntropyLoss()

# Observe that all parameters are being optimized
optimizer_sgd = optim.SGD(model_1.parameters(), lr=0.001, momentum=0.9)
optimizer_adam = optim.Adam(model_1.parameters(), lr=0.001)

# Decay LR by a factor of 0.1 every 7 epochs
exp_lr_scheduler = lr_scheduler.StepLR(optimizer_adam, step_size=7, gamma=0.1)
model_resnetft = train_model(model_1, criterion, optimizer_adam, exp_lr_scheduler,
                       num_epochs=15)
 
 output >>>
 Epoch 0/14
----------
Train Loss: 1.3397 Acc: 0.5660
Valid Loss: 1.0503 Acc: 0.6691
     .
     .
     .
     continues
     .
     .
     .
 Epoch 14/14
----------
Train Loss: 0.4054 Acc: 0.8709
Valid Loss: 0.4723 Acc: 0.8600

Training complete in 27m 23s
Best Valid Acc: 0.867714

So, you can see that the training takes almost 28 minutes to complete on the Nvidia Tesla P100 GPU. The best validation accuracy score is 86.77%.

Step 7: Now to see some results

The code first sets the model into evaluation mode ( model.eval() ) and initializes a counter images_so_far to keep track of the number of images visualized so far. A figure is also created using plt.figure().

The function then iterates through the validation data using enumerate( dataloaders['Valid'] ). For each iteration, move the input image and labels to the specified device (using inputs.to(device) and labels.to(device) ), and compute the model's predictions using model(inputs) . Use _, preds = torch.max(outputs, 1) to get the predicted class for each image.

For each input image, the code draws the image using imshow(inputs.cpu().data[j]) and sets the title to the predicted class. The code uses the counter images_so_far to keep track of the number of images visualized so far, and if the number of images visualized is equal to the specified number, the function returns.

Finally, the code sets the model back to its original training mode using model.train(mode=was_training) .

def visualize_model(model, num_images=6):
    was_training = model.training
    model.eval()
    images_so_far = 0
    fig = plt.figure()

    with torch.no_grad():
        for i, (inputs, labels) in enumerate(dataloaders['Valid']):
            inputs = inputs.to(device)
            labels = labels.to(device)

            outputs = model(inputs)
            _, preds = torch.max(outputs, 1)

            for j in range(inputs.size()[0]):
                images_so_far += 1
                ax = plt.subplot(num_images//2, 2, images_so_far)
                ax.axis('off')
                ax.set_title(f'predicted: {class_names[preds[j]]}')
                imshow(inputs.cpu().data[j])

                if images_so_far == num_images:
                    model.train(mode=was_training)
                    return
        model.train(mode=was_training)
        
        
 # Visualize model
 visualize_model(model_1)
4bc9e053a1a6a746d3fb825c1785dfea.png bfd04d3bd8ece1e472f10e0cdb0a56c1.png 4eca2029c6a61e1dbfe48ee9bcb9628d.png faaa5193aff61f21c750d68dedf1abd3.png eeafb6cd735a45328b1554b99bffe5ff.png

in conclusion

This article demonstrates how to use transfer learning to perform fast food classification using the ResNet18 architecture and PyTorch. This implementation shows how to fine-tune a pretrained model on the food dataset and evaluate the model's performance on the validation set.

The results show that transfer learning can effectively utilize the knowledge learned from large-scale datasets to improve the performance of food classification tasks. Overall, transfer learning is a powerful tool for solving computer vision problems and has the potential to revolutionize the field.

Here are some key learnings from this project:

  1. ResNet18 is a commonly used deep learning architecture in computer vision tasks and can be used as a feature extractor in transfer learning.

  2. Transfer learning is a technique in deep learning that fine-tunes a pre-trained model for a specific task.

  3. The code implementation shows how to fine-tune the pretrained ResNet18 model on the food dataset and evaluate the model's performance on the validation set.

  4. Data augmentation techniques increase the size of the training dataset and improve the performance of the model.

  5. The results show that transfer learning using ResNet18 and PyTorch can effectively classify fast food images with high accuracy.

  6. Transfer learning is a powerful tool for solving computer vision problems that has the potential to revolutionize the field.

☆ END ☆

If you see this, it means you like this article, please forward and like it. Search "uncle_pn" on WeChat, welcome to add the editor's WeChat "woshicver", and update a high-quality blog post in the circle of friends every day.

Scan the QR code to add editor↓

0edd7852b9855f9a98a73bfa065f62db.jpeg

Guess you like

Origin blog.csdn.net/woshicver/article/details/129943125