[youcans hands-on model] SqueezeNet model-CIFAR10 image classification

Welcome to the "youcans hands-on model" series
The content and resources of this column are synchronized to GitHub/youcans



This article uses PyTorch to implement the SqueezeNet network model, and uses the CIFAR10 dataset to train the model for image classification.


1. SqueezeNet convolutional neural network model

Forresti, Moskewcz et al. published the paper "SqueezeNet: AlexNet Level Accuracy with 50x Fewer Parameters and <0.5 MB Model Size" in 2016, proposing a lightweight deep learning neural network model called SqueezeNet.

SqueezeNet, MobileNet, ShuffleNet and Xception were all made public on arXiv in 2016 and are known as the four lightweight models.

【Paper download link】
SqueezeNet: AlexNet Level Accuracy with 50x Fewer Parameters and <0.5 MB Model Size

[GitHub address]:
[https://github.com/forresti/SqueezeNet]
[https://github.com/DeepScale/SqueezeNet]


1.1 Model Introduction

The innovation of SqueezeNet is to propose a fire module, including two parts of squeeze and expand, to reduce the parameter scale and the amount of calculation.

  • The squeeze layer uses a 1*1 convolution kernel to convolve the feature map of the previous layer to reduce the dimensionality of the feature map;
  • The expand layer uses the Inception structure, which is divided into 2 branches of 1*1 convolution and 3*3 convolution for splicing.

The size of the SqueezeNet pre-training model is about 4.8 MB, and the Top-5 accuracy rate on the ImageNet dataset is 80.3%, which is very lightweight and efficient.

insert image description here


1.2 Introduction to the paper

【Abstract】

Research on deep convolutional neural networks (CNNs) in recent years has mainly focused on improving accuracy. For a given level of accuracy, there are usually multiple CNN architectures that can achieve it. At the same accuracy, smaller CNN architectures have greater advantages: (1) Smaller CNNs require less server communication during distributed training. (2) For autonomous driving tasks, smaller CNNs require less bandwidth. (3) Smaller CNNs are more suitable for deployment on FPGAs and hardware devices with limited memory.

We propose a small CNN architecture called SqueezeNet. SqueezeNet achieves AlexNet-level accuracy on the ImageNet dataset with 50x fewer parameters. Through model compression technology, SqueezeNet can be compressed to less than 0.5MB.


【Background】

Since the LeNet model created the convolutional neural network, in 2012 AlexNet triggered a research boom in deep learning. Since then, from ZF-Net, VGGNet, GoogleNet, ResNet to DenseNet, the main purpose is to improve the accuracy rate. The main directions adopted include deepening the network structure and enhancing the function of the convolution module, but this also leads to the network model becoming more and more complex. , the amount of memory and calculation required is also greatly increased. SqueezeNet has opened up another direction, reducing the model parameters as much as possible and improving the operation speed without reducing the accuracy of the model.

One path to simplify the model is to compress the existing CNN model, such as using singular value decomposition (SVD) method, network pruning method, deep compression method, EIE hardware acceleration method.

Lightweight design is to adopt lightweight ideas in model design, such as lightweight convolution methods (depth separable convolution, group convolution), average pooling instead of fully connected layers, and 1×1 convolution for channel dimensionality reduction .


【Main innovation】

To reduce the size and computation of the network model, SqueezeNet follows the following three strategies:

  • Using 1*1 convolution to replace most of the 3*3 convolutions significantly reduces the amount of parameters and calculations.
  • Use the Squeeze layer (PointConv) to reduce the number of input channels (depth) of the 3*3 convolution kernel, reducing the amount of parameters and calculations.
  • Postponed downsampling, where downsampling is done at a later stage, makes the convolutional layers have larger feature maps for better classification results.

The core of SqueezeNet is the compression-expansion (Squeeze-Expand) structure, called the fire module, including the Squeeze layer with only 1*1 convolution, and the Expand spliced ​​​​by 1*1 convolution and 3*3 convolution. layer.

  • The Squeeze layer uses a 1*1 convolution kernel to convolve the feature map of the previous layer to reduce the dimensionality of the feature map.

  • The Expand layer uses the Inception structure, which is divided into two branches of 1*1 convolution and 3*3 convolution for splicing.

The convolution operation of these two branches has stride=1, padding=same, and the output feature maps have the same size and can be spliced. The output depth after splicing is the sum of 1*1 convolution depth e1 and 3*3 convolution depth e3 ( e 1 + e 3 ) (e_1+e_3)(e1+e3)

insert image description here

If you compare the architecture of Xception and DSC, SqueezeNet is the real "Extreme Inception", and the Xception model actually uses DSC.


【Model structure】

The architecture of the SqueezeNet Convolutional Neural Network is as follows.

  • SqueezeNet starts with a standard convolutional layer (conv1), then has 8 Fire modules (fire2-fire9), and ends with a standard convolutional layer (conv10).
  • Gradually increase the depth (number of feature maps) of each Fire module from the input segment to the output segment of the network.
  • After the conf1, fire4, fire8 and conf10 layers, there is a maximum pooling layer with stride=2.

The design details of SqueezeNet are as follows.

  • To make the feature maps output by 1*1 convolution and 3*3 convolution have the same size for concatenation, use padding=1 for 3*3 convolution.

  • Use the ReLU activation function for the squeeze and expand layers.

  • Use a 50% Dropout after the fire9 module to reduce the size.

  • Inspired by NiN, fully connected layers are not used.

  • The initial learning rate is set to 0.04 and decreases linearly during training.

The Caffe framework does not natively support convolutional layers with multiple resolutions. The expand layer is implemented by concatenating the outputs of the two convolutional layers together in the channel dimension using two separate convolutional layers of 1*1 and 3*3.

insert image description here


【Model configuration】

The Fire Module has two sizes of convolution kernels: 1*1 and 3*3. The paper uses the number of these two convolution kernels as a hyperparameter, which needs to be set manually. The specific structural configuration parameters of the SqueezeNet model given in the paper are as follows.

insert image description here


1.3 Analysis and discussion

The purpose of the SqueezeNet model is to simplify the model to the greatest extent and improve the operation speed under the condition of achieving a certain accuracy.

The direction of the SqueezeNet model is to reduce the number of parameters and calculations of the model.

The design strategy of the SqueezeNet model is to reduce the number of channels of 3*3 convolutions and replace some 3*3 convolutions with 1*1 convolutions.

The shortcomings of the SqueezeNet model are:

(1) The main problem in the embedded application environment is real-time performance. SqueezeNet achieves fewer parameters through deeper depth, which reduces the parallel capability of the network, and the reasoning time is longer.

(2) Although SqueezeNet has 50 times fewer parameters than AlexNet, this is mainly due to the fact that AlexNet's fully connected layer is too large, while SqueezeNet uses an average pooling layer, which has little to do with SqueezeNet's network structure.

(3) The size of the model obtained by SqueezeNet is about 5MB (Top5/80.3%), and the size of the model after Deep Compression is 0.5 MB, which has little to do with the network structure of SqueezeNet.

However, regardless of whether it is 5MB or 0.5 Mb, the SqueezeNet pre-training model has a Top-5 accuracy rate of 80.3% on the ImageNet dataset, which is already very lightweight and efficient.


2. Define the SqueezeNet model class in PyTorch

The SqueezeNet model is a network framework that can be designed with different network structures and hyperparameters for different tasks.

This section first focuses on the image classification problem of the CIFAR10 dataset, and introduces the construction process of the SqueezeNet model class in detail. Finally, the method of loading predefined model classes from torchvision.model will also be given.

2.1 Define Fire Module

The Fire module is the core of the SqueezeNet network architecture, including the Squeeze layer with only 1*1 convolution, and the Expand layer spliced ​​​​by 2 branches of 1*1 convolution and 3*3 convolution.

The routine that defines the Fire Module is as follows.

# 定义 Fire 模块 (Squeeze + Expand)
class Fire(nn.Module):
    def __init__(self, in_ch, squeeze_ch, e1_ch, e3_ch):  # 声明 Fire 模块的超参数
        super(Fire, self).__init__()
        # Squeeze, 1x1 卷积
        self.squeeze = nn.Conv2d(in_ch, squeeze_ch, kernel_size=1)
        # # Expand, 1x1 卷积
        self.expand1 = nn.Conv2d(squeeze_ch, e1_ch, kernel_size=1)
        # Expand, 3x3 卷积
        self.expand3 = nn.Conv2d(squeeze_ch, e3_ch, kernel_size=3, padding=1)
        self.activation = nn.ReLU(inplace=True)

    def forward(self, x):
        x = self.activation(self.squeeze(x))
        x = torch.cat([self.activation(self.expand1(x)),
                       self.activation(self.expand3(x))], dim=1)
        return x    

2.2 Simplified SqueezeNet model class

The simplified SqueezeNet model class is defined as follows. The model is simplified compared to the original model in the SqueezeNet paper, and uses a fully connected layer as a classifier.

For different datasets, some adaptations may be required. For example, the image classification problem of the CIFAR10 dataset is small in scale, and the picture size is 32*32, so the SqueezeNet model is simplified.

# 定义简化的 SqueezeNet 模型类 1
class SqueezeNet1(nn.Module):
    def __init__(self, num_classes=100):
        super(SqueezeNet1, self).__init__()
        self.conv1 = nn.Conv2d(3, 96, kernel_size=3, stride=1, padding=1)  # 3x32x32 -> 96x32x32
        self.relu = nn.ReLU(inplace=True)
        self.fire2 = Fire(96, 48, 32, 32)  # 96x32x32 -> 64x32x32
        self.maxpool1 = nn.MaxPool2d(kernel_size=2, stride=2)  # 64x32x32 -> 64x16x16
        self.fire3 = Fire(64, 32, 64, 64)  # 64x16x16 -> 128x16x16
        self.fire4 = Fire(128, 64, 128, 128)  # 128x16x16 -> 256x16x16
        self.maxpool2 = nn.MaxPool2d(kernel_size=2, stride=2)  # 256x16x16 -> 256x8x8
        self.fire5 = Fire(256, 64, 192, 192)  # 256x8x8 -> 384x8x8
        self.fire6 = Fire(384, 128, 256, 256)  # 384x8x8 -> 512x8x8
        self.maxpool3 = nn.MaxPool2d(kernel_size=2, stride=2)  # 512x8x8 -> 512x4x4
        self.avg_pool = nn.AdaptiveAvgPool2d((1,1))  # 512x4x4 -> 512x1x1
        self.linear = nn.Linear(512, num_classes)  # 512 -> num_classes

    def forward(self, x):
        x = self.conv1(x)
        x = self.relu(x)
        x = self.fire2(x)
        x = self.maxpool1(x)  # torch.Size([1, 64, 16, 16])
        x = self.fire3(x)
        x = self.fire4(x)
        x = self.maxpool2(x)  # torch.Size([1, 256, 8, 8])
        x = self.fire5(x)
        x = self.fire6(x)
        x = self.maxpool3(x)    # torch.Size([1, 512, 4, 4])
        x = self.avg_pool(x)  # torch.Size([1, 512, 1, 1])
        x = x.view(x.size(0), -1)  # torch.Size([1, 512])
        x = self.linear(x)  # torch.Size([1, 10])
        return x

2.3 Feature extraction and classifier module encapsulation

PyTorch provides a high-level API through the torch.nn module to build networks from scratch.

Serialized modules can be built through Sequential, which makes the hierarchy of network modules clearer and facilitates the construction of large and complex network models. Encapsulate the initial convolutional layer and the Fire module into the feature extraction module self.features, use the avgpool layer to construct the classifier module and encapsulate it according to the paper, and define the SqueezeNet model class.

# 定义简化的 SqueezeNet 模型类 2
class SqueezeNet2(nn.Module):
    def __init__(self, num_classes=100):
        super(SqueezeNet, self).__init__()
        self.num_classes = num_classes
        self.features = nn.Sequential(
            nn.Conv2d(3, 64, kernel_size=3, stride=1, padding=1),  # 3x32x32 -> 64x32x32
            nn.ReLU(inplace=True),
            Fire(64, 16, 64, 64),  # 64x32x32 -> 128x32x32
            nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True),  # 128x32x32 -> 128x16x16
            Fire(128, 32, 64, 64),  # 128x16x16 -> 128x16x16
            nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True),  # 128x16x16 -> 128x8x8
            Fire(128, 64, 128, 128),  # 128x8x8 -> 256x8x8
            nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True),  # 256x8x8 -> 256x4x4
            Fire(256, 64, 256, 256)  # 256x4x4 -> 512x4x4
        )
        self.classifier = nn.Sequential(
            nn.Dropout(p=0.2),
            nn.Conv2d(512, self.num_classes, kernel_size=1),
            nn.ReLU(inplace=True),
            nn.AdaptiveAvgPool2d((1, 1)),  # 512x4x4 -> 10x1x1
        )

    def forward(self, x):
        x = self.features(x)  # torch.Size([1, 512, 4, 4])
        x = self.classifier(x)  # torch.Size([1, 10, 1, 1])
        x = x.view(x.size(0), -1)  # torch.Size([1, 10])
        return x

2.4 Load the official SqueezeNet model class

The SqueezeNet model is provided in the torchvision.models package and Torch Hub, which is basically consistent with the structure of the original SqueezeNet paper.

torchvision.models provides SqueezeNetmodel classes and pre-trained models SqueezeNet | PyTorch can be used directly, the original code can refer to: SOURCE CODE .

The definition of the SqueezeNet model class in the torchvision.models package is as follows:

torchvision.models.squeezenet1_0(*, weights: Optional[SqueezeNet1_0_Weights] = None, progress: bool = True, **kwargs: Any) → SqueezeNet

Parameter Description:

  • weights (SqueezeNet1_0_Weights, optional) , the weight parameters of the pre-trained model. By default, only the model structure is loaded but not the model parameters.

  • progress (bool, optional) , display the download progress bar, the default is True.

  • **kwargs, model parameters.

Program description:

The SqueezeNet model class builds a squeezenet network model based on Fire module stacking, and the specific code is as follows.

The network model is divided into two parts: feature extractor features and classifier classifier. Two versions of the code v1.0 and v1.1 are implemented in the feature extraction part.

  • Version v1.0: The initial convolution layer conv1 selects nn.Conv2d(3,96, kernel_size=7, stride=2), and nn.MaxPool2d is placed after conv1, Fire4, Fire8.
  • Version v1.1: The initial convolution layer conv1 selects nn.Conv2d(3,64, kernel_size=3, stride=2), and nn.MaxPool2d is placed after conv1, Fire3, Fire5.

Note that the input to the model is a batch of RGB images of shape (1,3,H,W), where H and W are at least 224.

class Fire(nn.Module):
    def __init__(self, inplanes: int, squeeze_planes: int, expand1x1_planes: int, expand3x3_planes: int) -> None:
        super().__init__()
        self.inplanes = inplanes
        self.squeeze = nn.Conv2d(inplanes, squeeze_planes, kernel_size=1)
        self.squeeze_activation = nn.ReLU(inplace=True)
        self.expand1x1 = nn.Conv2d(squeeze_planes, expand1x1_planes, kernel_size=1)
        self.expand1x1_activation = nn.ReLU(inplace=True)
        self.expand3x3 = nn.Conv2d(squeeze_planes, expand3x3_planes, kernel_size=3, padding=1)
        self.expand3x3_activation = nn.ReLU(inplace=True)

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        x = self.squeeze_activation(self.squeeze(x))
        return torch.cat(
            [self.expand1x1_activation(self.expand1x1(x)), self.expand3x3_activation(self.expand3x3(x))], 1
        )

class SqueezeNet(nn.Module):
    def __init__(
        self,
        version: str = '1_0',
        num_classes: int = 1000
    ) -> None:
        super(SqueezeNet, self).__init__()
        self.num_classes = num_classes
        if version == '1_0':
            self.features = nn.Sequential(
                nn.Conv2d(3, 96, kernel_size=7, stride=2),
                nn.ReLU(inplace=True),
                nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True),
                Fire(96, 16, 64, 64),
                Fire(128, 16, 64, 64),
                Fire(128, 32, 128, 128),
                nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True),
                Fire(256, 32, 128, 128),
                Fire(256, 48, 192, 192),
                Fire(384, 48, 192, 192),
                Fire(384, 64, 256, 256),
                nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True),
                Fire(512, 64, 256, 256),
            )
        elif version == '1_1':
            self.features = nn.Sequential(
                nn.Conv2d(3, 64, kernel_size=3, stride=2),
                nn.ReLU(inplace=True),
                nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True),
                Fire(64, 16, 64, 64),
                Fire(128, 16, 64, 64),
                nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True),
                Fire(128, 32, 128, 128),
                Fire(256, 32, 128, 128),
                nn.MaxPool2d(kernel_size=3, stride=2, ceil_mode=True),
                Fire(256, 48, 192, 192),
                Fire(384, 48, 192, 192),
                Fire(384, 64, 256, 256),
                Fire(512, 64, 256, 256),
            )
        else:
            # FIXME: Is this needed? SqueezeNet should only be called from the
            # FIXME: squeezenet1_x() functions
            # FIXME: This checking is not done for the other models
            raise ValueError("Unsupported SqueezeNet version {version}:"
                             "1_0 or 1_1 expected".format(version=version))

        # Final convolution is initialized differently from the rest
        final_conv = nn.Conv2d(512, self.num_classes, kernel_size=1)
        self.classifier = nn.Sequential(
            nn.Dropout(p=0.5),
            final_conv,
            nn.ReLU(inplace=True),
            nn.AdaptiveAvgPool2d((1, 1))
        )

        for m in self.modules():
            if isinstance(m, nn.Conv2d):
                if m is final_conv:
                    init.normal_(m.weight, mean=0.0, std=0.01)
                else:
                    init.kaiming_uniform_(m.weight)
                if m.bias is not None:
                    init.constant_(m.bias, 0)

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        x = self.features(x)
        x = self.classifier(x)
        return torch.flatten(x, 1)

Input images to be normalized in the same way for all pretrained models, i.e. batches of 3-channel RGB images of shape (3*H*W), where H and W are at least 224. Images must be loaded in the range [0,1] and then normalized with mean [0.485, 0.456, 0.406] and standard deviation [0.229, 0.224, 0.225]. The usage routine is as follows.

import torch
model = torch.hub.load('pytorch/vision:v0.10.0', 'squeezenet1_0', pretrained=True)
# or
# model = torch.hub.load('pytorch/vision:v0.10.0', 'squeezenet1_1', pretrained=True)
model.eval()

3. CIFAR10 image classification based on Squeeze model

3.1 The basic steps of building a neural network model in PyTorch

The basic steps to build, train and use a neural network model with PyTorch are as follows.

  1. Prepare dataset: Load the dataset and preprocess the data.
  2. Design the model: instantiate the model class, define the loss function and optimizer, and determine the model structure and training method.
  3. Model training: use the training data set to train the model and determine the model parameters.
  4. Model inferring: Use the trained model to perform inference and predict the output results for the input data.
  5. Model saving/loading: Save the trained model for later use or deployment.

The following steps explain the routine of the Squeeze model.


3.2 Loading the CIFAR10 dataset

The sample structure of the general data set is balanced, the information is efficient, and the organization is standardized and easy to handle. Using a common dataset to train a neural network can not only improve work efficiency, but also facilitate evaluation of model performance.

PyTorch provides some commonly used image datasets, preloaded in torchvision.datasetsthe class. torchvisionThe module implements the core classes and methods required for neural networks, torchvision.datasetsincluding popular datasets, model architectures, and commonly used image conversion methods.

The CIFAR dataset is a classic small dataset for image classification, with two versions, CIFAR10 and CIFAR100. CIFAR10 has 10 classes and CIFAR100 has 100 classes. CIFAR10 Each image size is 32*32, including 10 categories of airplane, car, bird, cat, deer, dog, frog, horse, boat, and truck. CIFAR10 has a total of 60,000 images, including 50,000 in the training set and 10,000 in the test set. Each category has 6000 images and the dataset is balanced.

The method to load and use the CIFAR dataset is:

torchvision.datasets.CIFAR10()
torchvision.datasets.CIFAR100()

The CIFAR data set can be downloaded from the official website: http://www.cs.toronto.edu/~kriz/cifar.html and then used, or it can be automatically loaded using the datasets class (if the local path does not have the file, it will be automatically downloaded).

When downloading a dataset, use the predefined transform method for data preprocessing, including resizing images, normalizing them, and converting data formats into tensors. The mean and variance of the CIFAR10 dataset used for standardization are (0.4914, 0.4822, 0.4465), (0.2470, 0.2435, 0.2616). During the training process, transform_train increases randomness and improves generalization ability.

Large training data sets cannot load all samples for training at one time, and the Dataloader class can be used to automatically load data. Dataloader is an iterator whose basic function is to pass in a Dataset object and generate a batch of data according to the parameter batch_size.

The routine for loading the CIFAR-10 dataset using the DataLoader class is as follows.

    # (1) 将[0,1]的PILImage 转换为[-1,1]的Tensor
    transform_train = transforms.Compose([
        transforms.RandomHorizontalFlip(),  # 随机水平翻转
        transforms.RandomRotation(10),  # 随机旋转
        transforms.RandomAffine(0, shear=10, scale=(0.8, 1.2)),
        transforms.ColorJitter(brightness=0.2, contrast=0.2, saturation=0.2),
        transforms.Resize(32),  # 图像大小调整为 (w,h)=(32,32)
        transforms.ToTensor(),  # 将图像转换为张量 Tensor
        transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2470, 0.2435, 0.2616))])
    # 测试集不需要进行数据增强
    transform = transforms.Compose([
        transforms.Resize(32),
        transforms.ToTensor(),
        transforms.Normalize((0.4914, 0.4822, 0.4465), (0.2470, 0.2435, 0.2616))])

    # (2) 加载 CIFAR10 数据集
    batchsize = 128
    # 加载 CIFAR10 数据集, 如果 root 路径加载失败, 则自动在线下载
    # 加载 CIFAR10 训练数据集, 50000张训练图片
    train_set = torchvision.datasets.CIFAR10(root='../dataset', train=True,
                                            download=True, transform=transform_train)
    # train_loader = torch.utils.data.DataLoader(train_set, batch_size=batchsize)
    train_loader = torch.utils.data.DataLoader(train_set, batch_size=batchsize,
                                              shuffle=True, num_workers=8)
    # 加载 CIFAR10 验证数据集, 10000张验证图片
    test_set = torchvision.datasets.CIFAR10(root='../dataset', train=False,
                                           download=True, transform=transform)
    test_loader = torch.utils.data.DataLoader(test_set, batch_size=1000,
                                              shuffle=True, num_workers=8)
    # 创建生成器,用 next 获取一个批次的数据
    valid_data_iter = iter(test_loader)  # _SingleProcessDataLoaderIter 对象
    valid_images, valid_labels = next(valid_data_iter)  # images: [batch,3,32,32], labels: [batch]
    valid_size = valid_labels.size(0)  # 验证数据集大小,batch
    print(valid_images.shape, valid_labels.shape)

    # 定义类别名称,CIFAR10 数据集的 10个类别
    classes = ('plane', 'car', 'bird', 'cat', 'deer',
               'dog', 'frog', 'horse', 'ship', 'truck')


3.3 Establish SqueezeNet network model

Building a SqueezeNet network model for training involves three steps:

  • Instantiate the SqueezeNet model object;
  • Set the loss function for training;
  • Set the optimizer for training.

The torch.nn.functional module provides various built-in loss functions, this example uses the cross entropy loss function CrossEntropyLoss.

The torch.optim module provides various optimization methods, this example uses the Adam optimizer. Note that the parameter model.parameters() of the model should be passed to the optimizer object so that the optimizer can scan the parameters that need to be optimized.

    # (3) 构造 Squeeze 网络模型
    model = SqueezeNet1(num_classes=10)  # 实例化 Squeeze 网络模型
    model.to(device)  # 将网络分配到指定的device中
    # print(model)

    # 定义损失函数和优化器
    criterion = nn.CrossEntropyLoss()  # 定义损失函数 CrossEntropy
    optimizer = torch.optim.SGD(model.parameters(), momentum=0.9, lr=0.01)  # 定义优化器 SGD

3.4 SqueezeNet model training

The basic steps of PyTorch model training are:

  1. The output value of the feedforward calculation model;
  2. Calculate the loss function value;
  3. Calculate the gradient of weight weight and bias bias;
  4. Adjust the model parameters according to the gradient value;
  5. Reset gradients to 0 (for next loop).

During the model training process, the validation set data can be used to evaluate the model accuracy during the training process in order to control the training process. Model verification is to use the verification data for model reasoning. The model output is obtained through forward calculation, but the model error is not reversely calculated. Therefore, torch.no_grad() needs to be set.

The routine for model training using PyTorch is as follows.

    # (4) 训练 Squeeze 模型
    epoch_list = []  # 记录训练轮次
    loss_list = []  # 记录训练集的损失值
    accu_list = []  # 记录验证集的准确率
    num_epochs = 100  # 训练轮次
    for epoch in range(num_epochs):  # 训练轮次 epoch
        running_loss = 0.0  # 每个轮次的累加损失值清零
        for step, data in enumerate(train_loader, start=0):  # 迭代器加载数据
            optimizer.zero_grad()  # 损失梯度清零
            inputs, labels = data  # inputs: [batch,3,32,32] labels: [batch]
            outputs = model(inputs.to(device))  # 正向传播
            loss = criterion(outputs, labels.to(device))  # 计算损失函数
            loss.backward()  # 反向传播
            optimizer.step()  # 参数更新

            # 累加训练损失值
            running_loss += loss.item()
            # if step%100==99:  # 每 100 个 step 打印一次训练信息
            #     print("\t epoch {}, step {}: loss = {:.4f}".format(epoch, step, loss.item()))

        # 计算每个轮次的验证集准确率
        with torch.no_grad():  # 验证过程, 不计算损失函数梯度
            outputs_valid = model(valid_images.to(device))  # 模型对验证集进行推理, [batch, 10]
        pred_labels = torch.max(outputs_valid, dim=1)[1]  # 预测类别, [batch]
        accuracy = torch.eq(pred_labels, valid_labels.to(device)).sum().item() / valid_size * 100  # 计算准确率
        print("Epoch {}: train loss={:.4f}, accuracy={:.2f}%".format(epoch, running_loss, accuracy))

        # 记录训练过程的统计数据
        epoch_list.append(epoch)  # 记录迭代次数
        loss_list.append(running_loss)  # 记录训练集的损失函数
        accu_list.append(accuracy)  # 记录验证集的准确率

The result of the program running is as follows:

Epoch 0: train loss=900.4685, accuracy=8.59%
Epoch 1: train loss=900.4323, accuracy=10.94%
Epoch 2: train loss=900.4668, accuracy=10.55%
Epoch 3: train loss=900.4570, accuracy=10.94%

Epoch 98: train loss=193.8689, accuracy=80.86%
Epoch 99: train loss=192.4832, accuracy=80.86%

In particular, after about 20 rounds of training, the training loss of the model has hardly decreased. Using the pictures in the verification set for verification, the accuracy of the model is extremely low, indicating that the model has not effectively learned features, which may be caused by the disappearance of the gradient. After more than 30 rounds of training, the training loss began to decrease, and the inspection accuracy gradually increased, indicating that the model training has entered a normal state at this time. After 100 rounds of training, the accuracy of the validation set reaches about 80%.

insert image description here


3.5 Saving and loading of SqueezeNet model

After the model is trained, save the model for next use. There are two main ways to save models in PyTorch, one is to save the model weights, and the other is to save the entire model. This example uses the model.state_dict() method to return the model weights as a dictionary, and the torch.save() method serializes the weight dictionary to disk and saves the model as a .pth file.

    # (5) 保存 Squeeze 网络模型
    save_path = "../models/Squeeze_Cifar1"
    model_cpu = model.cpu()  # 将模型移动到 CPU
    model_path = save_path + ".pth"  # 模型文件路径
    torch.save(model.state_dict(), model_path)  # 保存模型权值
    # 优化结果写入数据文件
    result_path = save_path + ".csv"  # 优化结果文件路径
    WriteDataFile(epoch_list, loss_list, accu_list, result_path)

To use the trained model, first instantiate the model class, and then call the load_state_dict() method to load the weight parameters of the model.

    # 以下模型加载和模型推理,可以是另一个独立的程序
    # (6) 加载 Squeeze 网络模型进行推理
    device = torch.device("cuda" if torch.cuda.is_available() else "cpu")  # 检测并指定设备
    # 加载 Squeeze 预训练模型
    model = SqueezeNet1(num_classes=10)  # 实例化 Squeeze 网络模型
    model.to(device)  # 将网络分配到指定的device中
    model_path = "../models/Squeeze_Cifar1.pth"
    model.load_state_dict(torch.load(model_path))
    model.eval()  # 模型推理模式

Special attention should be paid to:

(1) The .pth file in PyTorch only saves the weight parameters of the model, but does not have the structural information of the model. Therefore, the model object must be instantiated before loading the model parameters.

(2) The model object must strictly correspond to the model parameters before it can be used normally. Note that even if they are both Squeeze models, there may be subtle differences in the specific definitions of the model classes. If the definition of the model class is obtained from one source, and the model parameter file is obtained from another source, it is easy to cause a mismatch between the model structure and the parameters.

(3) Regardless of the model and parameters loaded from the PyTorch model warehouse, or the pre-trained model obtained from other sources, or the model trained by yourself, the method of loading the model is the same, and attention must also be paid to the matching of the model structure and parameters question.


3.6 Model checking

Using the loaded SqueezeNet model, input a new picture for model reasoning, and the category of the input picture can be determined from the output of the model.

Use the test set data for model inference, and compare the prediction results of the model with the image labels to verify the accuracy of the model. The model validation set and the model testing set cannot be used interchangeably, but in order to simplify the routine, no distinction is made in this procedure.

    # (7) 模型检测
    correct = 0
    total = 0
    for data in test_loader:  # 迭代器加载测试数据集
        imgs, labels = data  # torch.Size([batch,3,32,32) torch.Size([batch])
        # print(imgs.shape, labels.shape)
        outputs = model(imgs.to(device))  # 正向传播, 模型推理, [batch, 10]
        labels_pred = torch.max(outputs, dim=1)[1]  # 模型预测的类别 [batch]
        # _, labels_pred = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += torch.eq(labels_pred, labels.to(device)).sum().item()
    accuracy = 100. * correct / total
    print("Test samples: {}".format(total))
    print("Test accuracy={:.2f}%".format(accuracy))

Using the test set for model inference, the accuracy of the test model is 81.86%.

Test samples: 10000
Test accuracy=81.86%


3.7 Model Inference

Using the loaded SqueezeNet model, input a new picture for model reasoning, and the category of the input picture can be determined from the output of the model.

Extract several pictures from the test set, or read image files, perform model inference, and obtain the classification categories of pictures. When extracting pictures or reading files, pay attention to proper conversion of picture format and picture size.

    # (8) 提取测试集图片进行模型推理
    batch = 8  # 批次大小
    data_set = torchvision.datasets.CIFAR10(root='../dataset', train=False,
                                           download=False, transform=None)
    plt.figure(figsize=(9, 6))
    for i in range(batch):
        imgPIL = data_set[i][0]  # 提取 PIL 图片
        label = data_set[i][1]  # 提取 图片标签
        # 预处理/模型推理/后处理
        imgTrans = transform(imgPIL)  # 预处理变换, torch.Size([3,32,32])
        imgBatch = torch.unsqueeze(imgTrans, 0)  # 转为批处理,torch.Size([batch=1,3,32,32])
        outputs = model(imgBatch.to(device))  # 模型推理, 返回 [batch=1, 10]
        indexes = torch.max(outputs, dim=1)[1]  # 注意 [batch=1], device = 'device
        index = indexes[0].item()  # 预测类别,整数
        # 绘制第 i 张图片
        imgNP = np.array(imgPIL)  # PIL -> Numpy
        out_text = "label:{}/model:{}".format(classes[label], classes[index])
        plt.subplot(2, 4, i+1)
        plt.imshow(imgNP)
        plt.title(out_text)
        plt.axis('off')
    plt.tight_layout()
    plt.show()

The result is as follows.

insert image description here

    # (9) 读取图像文件进行模型推理
    from PIL import Image
    filePath = "../images/img_car_01.jpg"  # 数据文件的地址和文件名
    imgPIL = Image.open(filePath)  # PIL 读取图像文件, <class 'PIL.Image.Image'>

    # 预处理/模型推理/后处理
    imgTrans = transform(imgPIL)  # 预处理变换, torch.Size([3, 32, 32])
    imgBatch = torch.unsqueeze(imgTrans, 0)  # 转为批处理,torch.Size([batch=1, 3, 32, 32])
    outputs = model(imgBatch.to(device))  # 模型推理, 返回 [batch=1, 10]
    indexes = torch.max(outputs, dim=1)[1]  # 注意 [batch=1], device = 'device
    percentages = nn.functional.softmax(outputs, dim=1)[0] * 100
    index = indexes[0].item()  # 预测类别,整数
    percent = percentages[index].item()  # 预测类别的概率,浮点数

    # 绘制第 i 张图片
    imgNP = np.array(imgPIL)  # PIL -> Numpy
    out_text = "Prediction:{}, {}, {:.2f}%".format(index, classes[index], percent)
    print(out_text)
    plt.imshow(imgNP)
    plt.title(out_text)
    plt.axis('off')
    plt.tight_layout()
    plt.show()

The result is as follows.

insert image description here


4. Image classification using SqueezeNet pre-trained model

The Torchvision.models package and Torch Hub not only provide the SqueezeNet model class, but also provide a pre-trained model trained on the ImageNet dataset, which can be directly used for image classification or migration learning.

Pay attention to the problem:

  1. The size of the SqueezeNet pre-training model is about 4.8 MB, and the Top-5 accuracy rate on the ImageNet dataset is 80.3%, which is very lightweight and efficient.

  2. The SqueezeNet model has two versions, v1.0 and v1.1, both of which have the same size and performance, and the v1.1 version has a smaller amount of calculation.

  3. The loaded SqueezeNet pre-training model is trained on the ImageNet dataset. The input of the model is a batch of RGB images of shape (1,3,H,W), and H, W are at least 224. The images are to be loaded into the [0,1] range, normalized with mean [0.485, 0.456, 0.406] and standard deviation [0.229, 0.224, 0.225].

The complete routine for image classification using the SqueezeNet pretrained model is as follows.

# Begin_Squeeze_3.py
# SqueezeNet model for beginner with PyTorch
# 加载 SqueezeNet 预训练模型和参数,对图像进行分类
# Copyright: [email protected]
# Crated: Huang Shan, 2023/06/04

# _*_coding:utf-8_*_
import torch
from torchvision import models
import torchvision.transforms as transforms
from matplotlib import pyplot as plt
import numpy as np

if __name__ == '__main__':

    # (1) 加载 Squeeze/PyTorch 预训练模型
    model = models.squeezenet1_0(pretrained=True)  # torchvision.models 方式加载
    # model = torch.hub.load('pytorch/vision:v0.10.0', 'squeezenet1_0', pretrained=True)  # torch.hub 方式加载
    # model = torch.hub.load('pytorch/vision:v0.10.0', 'squeezenet1_1', pretrained=True)
    model.eval()

    # (2) 定义输入图像的预处理变换,将 [0,1] 的 PILImage 转换为 [-1,1] 的Tensor
    transform = transforms.Compose([  # 定义图像变换组合
        transforms.Resize([256,256]),  # 图像大小调整为 (w,h)=(256,256)
        transforms.CenterCrop([224,224]),  # 图像中心裁剪为 (w,h)=(224,224)
        transforms.ToTensor(),  # 将图像转换为张量 Tensor
        transforms.Normalize(  # 对图像进行归一化
            mean=[0.485, 0.456, 0.406],  # 均值
            std=[0.229, 0.224, 0.225]  # 标准差
        )])

    # (3) 加载输入图像并进行预处理
    from PIL import Image
    filePath = "../images/img_car_01.jpg"  # 数据文件的地址和文件名
    imgPIL = Image.open(filePath)  # PIL 读取图像文件, <class 'PIL.Image.Image'>
    # 预处理/模型推理/后处理
    imgTrans = transform(imgPIL)  # 预处理变换, torch.Size([3,224,224])
    input_batch = torch.unsqueeze(imgTrans, 0)  # 转为批处理,torch.Size([batch=1,3,224,224])

    # (4) 模型推理
    with torch.no_grad():
        outputs = model(input_batch)  # 返回所有类别的置信度score,torch.Size([batch, 1000])
    # _, index = torch.max(outputs, 1)  # Top-1 类别的索引,tensor([208])
    # print("index: ", index.item())  # 208 : sports car, sport car

    # (5) 模型输出后处理
    # 读取 ImageNet 文本格式类别名称文件
    with open("../dataset/imagenet_classes.txt") as f:  # 类别名称保存为 txt 文件
        categories = [line.strip() for line in f.readlines()]
    print(type(categories), len(categories))  # <class 'list'> 1000

    # 计算所有类别的概率
    probabilities = torch.nn.functional.softmax(outputs[0], dim=0) * 100  # 所有类别的概率,torch.Size([batch, 1000])
    # 查找 Top-5 类别的索引
    top5_prob, top5_idx = torch.topk(probabilities, 5)  # Top-5 类别的概率和索引, torch.Size([5])
    print("Top-5 possible categories:")
    for i in range(top5_prob.size(0)):
        print(top5_idx[i], categories[top5_idx[i]], top5_prob[i].item())

    # (6) 图像分类结果的可视化
    import cv2
    imgCV = cv2.cvtColor(np.asarray(imgPIL), cv2.COLOR_RGB2BGR)  # PIL 转换为 CV 格式
    out_text = f"{
      
      categories[top5_idx[0]]}, {
      
      top5_prob[0].item():.3f}"  # 类别标签 + 概率
    cv2.putText(imgCV, out_text, (25, 50), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)  # 在图像上添加类别标签
    cv2.imshow('Image classification', imgCV)
    key = cv2.waitKey(0)  # delay=0, 不自动关闭
    cv2.destroyAllWindows()

    # # 绘制图片
    # imgNP = np.array(imgPIL)  # PIL -> Numpy
    # out_text = f"{categories[top5_idx[0]]}, {top5_prob[0].item():.3f}"  # 类别标签 + 概率
    # print(out_text)
    # plt.title(out_text)
    # plt.imshow(imgNP)
    # plt.axis('off')
    # plt.tight_layout()
    # plt.show()

The image classification results are as follows.

Top-5 possible categories:
tensor(817) 817: 'sports car, sport car', 62.149803161621094
tensor(436) 436: 'beach wagon, station wagon, wagon, estate car, beach waggon, station waggon, waggon', 11.559002876281738
tensor(511) 511: 'convertible', 5.815169811248779
tensor(656) 656: 'minivan', 4.306943893432617
tensor(627) 627: 'limousine, limo', 4.196065902709961

insert image description here


references:

  1. Forrest Iandols, Song Han, Matthew Moskewicz, et al. SqueezeNet: AlexNet Level Accuracy with 50x Fewer Parameters and <0.5 MB Model Size, 2017

【End of this section】


Copyright statement:
Welcome to the "youcans hands-on model" series.
For forwarding, please indicate the original link:
[youcans hands-on model] SqueezeNet model-CIFAR10 image classification
Copyright 2023 youcans, XUPT
Crated: 2023-06-27


Guess you like

Origin blog.csdn.net/youcans/article/details/131423753