Artificial intelligence (pytorch) building model 15 - build the MnasNet model by hand, and realize the training and prediction of the model

Hello everyone, I am Weixue AI. Today I will introduce to you the artificial intelligence (pytorch) model building 15-build the MnasNet model by hand, and realize the training and prediction of the model. This article will introduce the principle of the MnasNet model, and use the PyTorch framework to build a The MnasNet model is used for image classification tasks, so that everyone can fully understand the model.

The article will be divided into the following sections:

  1. Introduction to the MnasNet model
  2. Implementation of the MnasNet model
  3. Dataset preparation
  4. model training
  5. model testing
  6. in conclusion

1. Introduction to MnasNet model

MnasNet (Mobile Neural Architecture Search Network) is an efficient convolutional neural network obtained through search, which was first proposed by Google in 2018. The main feature of MnasNet is to reduce the computational complexity and the number of parameters as much as possible while ensuring the high performance of the model. It is suitable for scenarios with limited resources such as mobile devices.

The core idea of ​​MnasNet is to use Neural Architecture Search (NAS) technology to find the optimal model structure. The goal of NAS is to automatically search for the best-performing neural network structure under the constraints of a given task and hardware platform. The search space used by MnasNet mainly includes basic operations such as convolutional layers, depth-separable convolutional layers, and inverted residual structures.
insert image description here
The mathematical principle of the MnasNet model can be expressed by the following formula:

Suppose the input image is x ∈ RH × W × C x\in \mathbb{R}^{H\times W\times C}xRH × W × C , whereHHH W W W C C C represents the height, width and channel number of the image, respectively. The MnasNet model can be seen as a functionf ( x ; θ ) f(x;\theta)f(x;θ ) , whereθ \thetaθ represents the parameters of the model, including convolution kernels, batch normalization parameters, fully connected layer parameters, etc.

The core of the MnasNet model is the automatic neural architecture search technology, which can automatically search for the best neural network architecture. Assume that the best neural network architecture obtained by searching is AAA , then the output of the model can be expressed as:

f ( x ; θ A ) = f A ( x ; θ A ) f(x;\theta_A)=f_A(x;\theta_A) f(x;iA)=fA(x;iA)

where f A ( x ; θ A ) f_A(x;\theta_A)fA(x;iA) means using the neural network architectureAAThe model built by A , θ A \theta_AiARepresents model AAA parameter. ModelAAThe parameters of A consist of two parts, namely shared parameters and non-shared parameters, which can be expressed as:

θ A = { w , α } \theta_A=\{w,\alpha\}iA={ w,a }

in that www means shared parameters,α \alphaα denotes a non-shared parameter. Shared parameters are shared between different neural network architectures, while non-shared parameters are independent of each neural network architecture.

Due to the existence of automatic neural architecture search technology, the MnasNet model can significantly reduce the amount of parameters and calculations of the model under the premise of ensuring accuracy, so that it can be better applied on mobile devices.

2. Realization of MnasNet model

The following is the code to implement the MnasNet model with PyTorch:

import torch
import torch.nn as nn
import torch.optim as optim
import torchvision.transforms as transforms
import torchvision.datasets as datasets
from torch.utils.data import DataLoader

class _InvertedResidual(nn.Module):
    def __init__(self, in_ch, out_ch, kernel_size, stride, expansion_factor):
        super(_InvertedResidual, self).__init__()
        hidden_dim = round(in_ch * expansion_factor)
        self.use_residual = in_ch == out_ch and stride == 1
        
        layers = []
        if expansion_factor != 1:
            layers.append(nn.Conv2d(in_ch, hidden_dim, 1, 1, 0, bias=False))
            layers.append(nn.BatchNorm2d(hidden_dim))
            layers.append(nn.ReLU6(inplace=True))
        
        layers.extend([
            nn.Conv2d(hidden_dim, hidden_dim, kernel_size, stride, kernel_size//2, groups=hidden_dim, bias=False),
            nn.BatchNorm2d(hidden_dim),
            nn.ReLU6(inplace=True),
            nn.Conv2d(hidden_dim, out_ch, 1, 1, 0, bias=False),
            nn.BatchNorm2d(out_ch),
        ])

        self.layers = nn.Sequential(*layers)
    
    def forward(self, x):
        if self.use_residual:
            return x + self.layers(x)
        else:
            return self.layers(x)

class MnasNet(nn.Module):
    def __init__(self, num_classes=1000, alpha=1.0):
        super(MnasNet, self).__init__()
        self.alpha = alpha
        self.num_classes = num_classes

        def conv_dw(in_ch, out_ch, stride):
            return _InvertedResidual(in_ch, out_ch, 3, stride, 1)

        def conv_pw(in_ch, out_ch, stride):
            return _InvertedResidual(in_ch, out_ch, 1, stride, 6)

        def make_layer(in_ch, out_ch, num_blocks, stride):
            layers = [conv_pw(in_ch, out_ch, stride)]
            for _ in range(num_blocks - 1):
                layers.append(conv_pw(out_ch, out_ch, 1))
            return nn.Sequential(*layers)

        # 构建MnasNet模型
        self.model = nn.Sequential(
            nn.Conv2d(3, int(32 * alpha), 3, 2, 1, bias=False),
            nn.BatchNorm2d(int(32 * alpha)),
            nn.ReLU6(inplace=True),
            make_layer(int(32 * alpha), int(16 * alpha), 1, 1),
            make_layer(int(16 * alpha), int(24 * alpha), 2, 2),
            make_layer(int(24 * alpha), int(40 * alpha), 3, 2),
            make_layer(int(40 * alpha), int(80 * alpha), 4, 2),
            make_layer(int(80 * alpha), int(96 * alpha), 2, 1),
            make_layer(int(96 * alpha), int(192 * alpha), 4, 2),
            make_layer(int(192 * alpha), int(320 * alpha), 1, 1),
            nn.Conv2d(int(320 * alpha), 1280, 1, 1, 0, bias=False),
            nn.BatchNorm2d(1280),
            nn.ReLU6(inplace=True),
        )

        self.classifier = nn.Sequential(
            nn.AdaptiveAvgPool2d(1),
            nn.Flatten(),
            nn.Dropout(0.2),
            nn.Linear(1280, self.num_classes),
        )

    def forward(self, x):
        x = self.model(x)
        x = self.classifier(x)
        return x

3. Dataset preparation

We will train and test our MnasNet model on the CIFAR-10 dataset. The CIFAR-10 dataset contains 60,000 32x32 color images of 10 categories, each category has 6,000 images. Among them, 50,000 images are used for training and 10,000 images are used for testing.

The code to prepare the dataset is as follows:

transform_train = transforms.Compose([
    transforms.RandomCrop(32, padding=4),
    transforms.RandomHorizontalFlip(),
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
])

transform_test = transforms.Compose([
    transforms.ToTensor(),
    transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2023, 0.1994, 0.2010)),
])

trainset = datasets.CIFAR10(root='./data', train=True, download=True, transform=transform_train)
trainloader = DataLoader(trainset, batch_size=100, shuffle=True, num_workers=2)

testset = datasets.CIFAR10(root='./data', train=False, download=True, transform=transform_test)
testloader = DataLoader(testset, batch_size=100, shuffle=False, num_workers=2)

4. Model Training

Next, we will use the training set to train the MnasNet model, and output the training loss and accuracy after each epoch. The training code is as follows:

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
num_classes = 10
mnasnet = MnasNet(num_classes=num_classes, alpha=0.5)
mnasnet = mnasnet.to(device)

criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(mnasnet.parameters(), lr=0.1, momentum=0.9, weight_decay=5e-4)

# 学习率调整策略
def adjust_learning_rate(optimizer, epoch):
    lr = 0.1
    if epoch >= 80:
        lr = 0.01
    if epoch >= 120:
        lr = 0.001
    for param_group in optimizer.param_groups:
        param_group['lr'] = lr

num_epochs = 150
for epoch in range(num_epochs):
    adjust_learning_rate(optimizer, epoch)
    mnasnet.train()
    train_loss = 0
    correct = 0
    total = 0

    for batch_idx, (inputs, targets) in enumerate(trainloader):
        inputs, targets = inputs.to(device), targets.to(device)
        optimizer.zero_grad()
        outputs = mnasnet(inputs)
        loss = criterion(outputs, targets)
        loss.backward()
        optimizer.step()

        train_loss += loss.item()
        _, predicted = outputs.max(1)
        total += targets.size(0)
        correct += predicted.eq(targets).sum().item()

    print(f'Epoch: {epoch+1}, Loss: {train_loss/(batch_idx+1)}, Acc: {100.*correct/total}')

5. Model testing

After the training is complete, we use the test set to test the model, and output the test loss and accuracy. The test code is as follows:

mnasnet.eval()
test_loss = 0
correct = 0
total = 0

with torch.no_grad():
    for batch_idx, (inputs, targets) in enumerate(testloader):
        inputs, targets = inputs.to(device), targets.to(device)
        outputs = mnasnet(inputs)
        loss = criterion(outputs, targets)

        test_loss += loss.item()
        _, predicted = outputs.max(1)
        total += targets.size(0)
        correct += predicted.eq(targets).sum().item()

print(f'Test Loss: {test_loss/(batch_idx+1)}, Acc: {100.*correct/total}')

6 Conclusion

This article mainly introduces the construction, training and testing of the MnasNet model. The feature of MnasNet is that it uses automatic neural architecture search technology, which can significantly reduce the amount of parameters and calculations of the model under the premise of ensuring accuracy, so that it can be obtained on mobile devices. better application.

Summary of the construction and training of the MnasNet model:

Data set preparation: First, you need to prepare appropriate data sets, including training sets, validation sets, and test sets. You can use public datasets, such as ImageNet, or you can use datasets you have collected yourself.

Model architecture search: MnasNet uses neural architecture search technology, which can automatically search for the best neural network architecture. This process requires a lot of computing resources and time, and can be performed on the GPU or in the cloud.

Model building: After obtaining the best neural network architecture, it needs to be built into a trainable model. This process can be implemented using deep learning frameworks, such as TensorFlow, PyTorch, etc.

Model training: To train a model, you need to select an appropriate optimization algorithm and loss function, and set appropriate hyperparameters, such as learning rate and batch size. During the training process, some tricks can be used to improve the performance of the model, such as data augmentation, learning rate adjustment, etc.

Model evaluation: After training, you need to use the validation set or test set to evaluate the performance of the model. Some indicators can be used to evaluate the accuracy rate, recall rate, F1 value, etc. of the model.

Guess you like

Origin blog.csdn.net/weixin_42878111/article/details/131423355