SqueezeNet Algorithm Analysis - Bird Recognition - Paddle Combat


Today, I will explain the SqueezeNet algorithm in detail. SqueezeNet is a lightweight and efficient CNN model. Its parameters are 50 times less than AlexNet, but the model performance (accuracy) is close to AlexNet.

This actual combat is still a classic classification problem: bird classification.

The actual bird data set of this project is mainly divided into 4 categories, namely bananaquit (banana warbler), Black Skimmer (black tern), Black Throated Bushtiti (black-throated tree warbler), Cockatoo (cockatoo or sunflower parrot) ), a total of 565 sheets.

1. Theoretical basis

1 Introduction

SqueezeNet algorithm, as the name suggests, Squeeze means compression and extrusion in Chinese, so we can guess from the name of the algorithm that the algorithm must reduce the amount of model parameters by decompressing the model. Of course, the improvement of any algorithm is to improve the accuracy or reduce the model parameters on the original basis, so the main purpose of the algorithm is to reduce the amount of model parameters while maintaining the model accuracy.

With the research and development of CNN convolutional neural network, more and more models have been developed, and in order to improve the accuracy of the model, deep models such as AlexNet and ResNet have been widely recognized by everyone. However, due to the requirements of application scenarios, many models cannot meet the requirements of actual application scenarios due to the large number of parameters, such as technologies such as automatic driving. So people started to focus on lightweight models,

The SqueezeNet model is therefore "born in time".

In the SqueezeNet paper, the advantages of the lightweight model are summarized as follows:

● More efficient distributed training. Inter-server communication is the limiting factor in the scalability of distributed CNN training. For distributed data-parallel training, the communication overhead is proportional to the number of parameters in the model. In short, small models train faster because less communication is required.

● Reduce overhead when exporting new models to clients. When it comes to autonomous driving, companies like Tesla regularly copy new models from their servers to customers' cars. This practice is often referred to as an over-the-air update. Consumer Reports found that with recent over-the-air updates, the safety of Tesla's Autopilot semi-autonomous driving feature has gradually improved (Consumer Reports, 2016). However, over-the-air updates of today's typical CNN/DNN models can require massive data transfers. Using AlexNet, this would require 240MB of communication from the server to the car. Smaller models require less communication, making frequent updates more feasible.

● Feasible FPGA and embedded deployment. The on-chip memory of fpga is usually less than 10MB1, and there is no off-chip memory or memory. For inference, sufficiently small models can be stored directly on the FPGA without being bottlenecked by memory bandwidth (Qiu et al., 2016), while video frames are passed through the FPGA in real time. Furthermore, when CNNs are deployed on application-specific integrated circuits (ASICs), sufficiently small models can be stored directly on-chip, and smaller models allow ASICs to fit into smaller chips.

2. Design concept

Ways to reduce model parameters have existed in the past, and a sensible approach is to take an existing CNN model and compress it in a lossy way. A research community has emerged in recent years around the topic of model compression, and several approaches have been reported. A fairly straightforward approach by Denton et al. is to apply singular value decomposition (SVD) to a pretrained CNN model. Han et al. developed Network Pruning, starting from the pre-trained model, replacing parameters below a certain threshold with zero to form a sparse matrix, and finally performing several iterations of training on the sparse CNN. Han et al. extended their work by combining network pruning with quantization (to 8 bits or less) and Huffman coding, creating a method called Deep Compression and further designing a A hardware accelerator called EIE, which runs directly on the compressed model, achieves significant speedup and power savings.

The main strategy of the SqueezeNet algorithm is also a compression strategy. In terms of model compression, a total of three methods and strategies are used, which we describe below.

2.1 CNN microarchitecture (CNN MicroArchitecture)

At the same time, the author of the SqueezeNet paper considers that with the trend of designing deep CNN convolutional neural networks, it becomes very troublesome to manually select the size of each layer of filters. So in order to solve this problem, various high-level building blocks or modules consisting of multiple convolutional layers with a specific fixed organization have been proposed on the Internet. For example, the GoogleNet paper proposes the Inception module, which consists of filters of many different dimensions, usually including 1 × 1 1\times11×1 and3 × 3 3\times33×3 , sometimes add5 × 5 5\times55×5 , sometimes plus1 × 3 1\times31×3 and3 × 1 3\times13×1 , possibly with additional ad hoc layers to form a complete network. In the WideResNet algorithm we explained in the previous article, we also explained a similar block, the residual block. The author of the SqueezeNet paper collectively refers to the specific organization and dimension of each module as the CNN microarchitecture.

2.2 CNN Macro Architecture

CNN microarchitecture refers to individual layers and modules, while CNN macroarchitecture can be defined as the system-level organization of multiple modules, forming an end-to-end CNN architecture. Perhaps the most widely studied CNN macro-architecture topic in the recent literature is the effect of depth (i.e., number of layers) in the network. For example, the VGG12-19 layer produces higher accuracy on the ImageNet-1k dataset. Selecting connections across multiple layers or modules is an emerging area of ​​CNN macroarchitecture research. For example, both the residual network (ResNet) and the highway network (Highway Network) recommend skipping multi-layer connections, such as connecting the activation of the third layer to the activation of the sixth layer. We call this connection bypass connection. The author of ResNet provides an A/B comparison of a 34-layer CNN with and without bypass connections (ie, skip connections). The experimental comparison found that adding bypass connections can improve the accuracy of ImageNet's top 5 by 2 percentage points.

2.3 Model network design exploration process

The authors of the paper believe that neural networks (including deep neural network DNN and convolutional neural network CNN) have a large design space, with many choices of micro-architecture, macro-architecture, solver and other hyperparameters. Naturally, the community wants to gain intuition (i.e., the shape of the design space) about how these factors affect the accuracy of neural networks. Much of the work on design space exploration (DSE) of neural networks has focused on developing automated methods to find neural network architectures that provide higher accuracy. These automated DSE methods include Bayesian optimization (Snoek et al, 2012), simulated annealing (Ludermir et al, 2006), random search (Bergstra & Bengio, 2012), and genetic algorithms (Stanley & Miikkulainen, 2002). To their credit, each of these papers provides a case where the proposed DSE method produces a NN architecture with higher accuracy than representative baselines. However, these papers do not attempt to provide intuition about the shape of the neural network design space. Later in the SqueezeNet paper, the authors eschew automated approaches—instead, the authors refactor CNNs so that principled a/B comparisons can be made to study how CNN architectural decisions affect model size and accuracy.

2.4 Structural Design Strategy

The main goal of the SqueezeNet algorithm is to build a CNN architecture with few parameters, while guaranteeing the accuracy that other models also have. In order to achieve this goal, the author adopted a total of three strategies to design the CNN architecture, as follows:

Strategy 1: Divide 3×3 into 3×33×3 Convolution is replaced by1 × 1 1 × 11×1 Convolution: Through this step, the number of parameters of a convolution operation is reduced by9 99 times;

Strategy 2: Reduce 3 × 3 3\times33×The number of channels for 3 convolutions: a3 × 3 3\times33×3 The calculation amount of convolution is3 × 3 × M × N 3\times3\times M\times N3×3×M×N (where M and N are the number of channels of the input feature map and the output feature map respectively), the author believes that such a calculation is too large, so he hopes to reduce M and N as much as possible to reduce the number of parameters.

Strategy 3: Downsample late in the network so that convolutional layers have large activation maps. In a convolutional network, each convolutional layer produces an output activation map with a spatial resolution of at least 1x1 and usually much larger than 1x1. The height and width of these activation maps are determined by: (1) the size of the input data (e.g. 256x256 images) and (2) the choice of downsampling layers in the CNN architecture.

Where strategies 1 and 2 are about judiciously reducing the number of parameters in a CNN while trying to maintain accuracy. Strategy 3 is about maximizing accuracy with a limited parameter budget. Next, we describe the Fire module, the building block of our CNN architecture that enables us to successfully employ strategies 1, 2, and 3.

2.5 Fire module

Fire module includes: squeeze convolution layer (squeeze convolution), input expansion layer (expand), which has 1 × 1 1\times11×1 and3 × 3 3\times33×A mix of 3 convolutional filters. Only 1 × 1 1\times1in squeezed convolutional layer1×1 convolutional filter, while the expansion layer is mixed with1 × 1 1\times11×1 and3 × 3 3\times33×3 convolution filters. At the same time, the module introduces three hyperparameters for adjusting dimensions:

s 1 x 1 s_{1x1} s1x1: during squeeze 1 × 1 1 \times 11×1 The number of convolution filters;

e 1 x 1 e_{1x1} e1x1: 1 × 1 1\times1 in expand1×1 The number of convolution filters;

e 3 x 3 e_{3x3} e3 x 3: 3 × 3 3 \times 3 in expand3×3 the number of convolution filters;

3. Network structure

The network architecture of SqueezeNet is shown in the figure below:

● Left: SqueezeNet;

● Middle panel: SqueezeNet with simple bypass;

● Right: SqueezeNet with complex bypasses;

From the left picture in Figure 1-2, we can see that SqueezeNet starts with an independent convolutional layer (Conv1), then passes through 8 Fire modules (fire2-9), and finally ends with a convolutional layer (Conv10) Finish. We gradually increase the number of filters per fire module from the beginning to the end of the network. After conv1, fire4, fire8, and conv10 layers, SqueezeNet performs max pooling with a stride of 2; these relatively late pooling placements are according to strategy 3 in 2.4.

Whereas the simple bypass architecture adds bypass connections around modules 3, 5, 7 and 9, requiring these modules to learn a residual function between the input and output. Like ResNet, in order to achieve a connection bypassing Fire3, the input of Fire4 is set to (output of Fire2 + output of Fire3), where the + operator is element-wise addition. This changes the regularization applied to the parameters of these Fire modules and, depending on ResNet, can improve either the final accuracy or the ability to train a full model.

While simple bypasses are "just a single wire", complex bypasses consist of bypasses of 1x1 convolutional layers with the number of filters set equal to the desired number of output channels.

Note: In simple cases, the number of input channels and the number of output channels must be the same. Therefore, only half of the Fire modules can be connected for a simple bypass, as shown in the diagram in Figure 2. When the "same number of channels" requirement cannot be met, we use complex bypass connections, as shown on the right side of Figure 1-2. Complex bypass connections add extra parameters to the model, while simple bypass connections do not.

The complete SqueezeNet architecture is shown in the figure below:

4. Evaluation Analysis

The results of comparing SqueezeNet and different model compression methods are as follows:

Effect of Squeeze Ratio (SR) on Model Size and Accuracy under Microarchitecture and 3 × 3 3\times3 in Extension Layer3×3 The experimental comparison chart of the influence of the filter ratio on the model size and accuracy is as follows:

The accuracy comparison chart of the three structures (common structure, simple branch structure and complex branch structure) in the macro structure is as follows:

Two, actual combat


1. Data preprocessing

!unzip /home/aistudio/data/data223822/bird_photos.zip -d /home/aistudio/work/dataset

Since there is an additional ipynb_checkpoints file when we process the dataset file, we need to delete the following through the following command. Remember! Be sure to delete~

%cd /home/aistudio/work
!rm -rf .ipynb_checkpoints
  • Divide the dataset
import os
import random

train_ratio = 0.7
test_ratio = 1-train_ratio

rootdata = "/home/aistudio/work/dataset"

train_list, test_list = [],[]
data_list = []
class_flag = -1
for a,b,c in os.walk(rootdata):
    for i in range(len(c)):
        data_list.append(os.path.join(a,c[i]))

    for i in range(0, int(len(c)*train_ratio)):
        train_data = os.path.join(a, c[i])+' '+str(class_flag)+'\n'
        train_list.append(train_data)


    for i in range(int(len(c)*train_ratio),len(c)):
        test_data = os.path.join(a,c[i])+' '+str(class_flag)+'\n'
        test_list.append(test_data)

    class_flag += 1

random.shuffle(train_list)
random.shuffle(test_list)

with open('/home/aistudio/work/train.txt','w',encoding='UTF-8') as f:
    for train_img in train_list:
        f.write(str(train_img))

with open('/home/aistudio/work/test.txt', 'w', encoding='UTF-8') as f:
    for test_img in test_list:
        f.write(test_img)

2. Data reading

  • Import the following required libraries
import paddle
import paddle.nn.functional as F
import numpy as np
import math
import random
import os
from paddle.io import Dataset  # 导入Datasrt库
import paddle.vision.transforms as transforms
import paddle.nn as nn
import numpy as np
from PIL import Image
  • Define a data reader using paddle.io.DataLoader
# 归一化
transform_BZ = transforms.Normalize(
    mean=[0.5, 0.5, 0.5],
    std=[0.5, 0.5, 0.5]
)

class LoadData(Dataset):
    def __init__(self, txt_path, train_flag=True):
        self.imgs_info = self.get_images(txt_path)
        self.train_flag = train_flag

        self.train_tf = transforms.Compose([
            transforms.Resize(224),                  # 调整图像大小为224x224
            transforms.RandomHorizontalFlip(),       #  随机左右翻转图像
            transforms.RandomVerticalFlip(),         # 随机上下翻转图像
            transforms.ToTensor(),                   # 将 PIL 图像转换为张量
            transform_BZ                             # 执行某些复杂变换操作
        ])
        self.val_tf = transforms.Compose([
            transforms.Resize(224),                  # 调整图像大小为224x224
            transforms.ToTensor(),                   # 将 PIL 图像转换为张量
            transform_BZ                             # 执行某些变换操作
        ])

    def get_images(self, txt_path):
        with open(txt_path, 'r', encoding='utf-8') as f:
            imgs_info = f.readlines()
            imgs_info = list(map(lambda x: x.strip().split(' '), imgs_info))
        return imgs_info

    def padding_black(self, img):
        w, h = img.size
        scale = 32. / max(w, h)
        img_fg = img.resize([int(x) for x in [w * scale, h * scale]])
        size_fg = img_fg.size
        size_bg = 32
        img_bg = Image.new("RGB", (size_bg, size_bg))
        img_bg.paste(img_fg, ((size_bg - size_fg[0]) // 2,
                              (size_bg - size_fg[1]) // 2))

        img = img_bg
        return img

    def __getitem__(self, index):
        img_path, label = self.imgs_info[index]
        
        img_path = os.path.join('',img_path)
        img = Image.open(img_path)
        img = img.convert("RGB")
        img = self.padding_black(img)
        if self.train_flag:
            img = self.train_tf(img)
        else:
            img = self.val_tf(img)
        label = int(label)
        return img, label

    def __len__(self):
        return len(self.imgs_info)
  • Load training set and test set
train_data = LoadData("/home/aistudio/work/train.txt", True)
test_data = LoadData("/home/aistudio/work/test.txt", True)

#数据读取
train_loader = paddle.io.DataLoader(train_data, batch_size=32, shuffle=True)
test_loader = paddle.io.DataLoader(test_data, batch_size=32, shuffle=True)

3. Import the model

class Fire(nn.Layer):

    def __init__(self, inplanes, squeeze_planes,
                 expand1x1_planes, expand3x3_planes):
        super(Fire, self).__init__()
        self.inplanes = inplanes
        self.squeeze = nn.Conv2D(inplanes, squeeze_planes, kernel_size=1)
        self.squeeze_activation = nn.ReLU()
        self.expand1x1 = nn.Conv2D(squeeze_planes, expand1x1_planes,
                                   kernel_size=1)
        self.expand1x1_activation = nn.ReLU()
        self.expand3x3 = nn.Conv2D(squeeze_planes, expand3x3_planes,
                                   kernel_size=3, padding=1)
        self.expand3x3_activation = nn.ReLU()

    def forward(self, x):
        x = self.squeeze_activation(self.squeeze(x))
        return paddle.concat([
            self.expand1x1_activation(self.expand1x1(x)),
            self.expand3x3_activation(self.expand3x3(x))
        ], 1)

class SqueezeNet(nn.Layer):

    def __init__(self, version='1_0', num_classes=1000):
        super(SqueezeNet, self).__init__()
        self.num_classes = num_classes
        if version == '1_0':
            self.features = nn.Sequential(
                nn.Conv2D(3, 96, kernel_size=7, stride=2),
                nn.ReLU(),
                nn.MaxPool2D(kernel_size=3, stride=2, ceil_mode=True),
                Fire(96, 16, 64, 64),
                Fire(128, 16, 64, 64),
                Fire(128, 32, 128, 128),
                nn.MaxPool2D(kernel_size=3, stride=2, ceil_mode=True),
                Fire(256, 32, 128, 128),
                Fire(256, 48, 192, 192),
                Fire(384, 48, 192, 192),
                Fire(384, 64, 256, 256),
                nn.MaxPool2D(kernel_size=3, stride=2, ceil_mode=True),
                Fire(512, 64, 256, 256),
            )
        elif version == '1_1':
            self.features = nn.Sequential(
                nn.Conv2D(3, 64, kernel_size=3, stride=2),
                nn.ReLU(),
                nn.MaxPool2D(kernel_size=3, stride=2, ceil_mode=True),
                Fire(64, 16, 64, 64),
                Fire(128, 16, 64, 64),
                nn.MaxPool2D(kernel_size=3, stride=2, ceil_mode=True),
                Fire(128, 32, 128, 128),
                Fire(256, 32, 128, 128),
                nn.MaxPool2D(kernel_size=3, stride=2, ceil_mode=True),
                Fire(256, 48, 192, 192),
                Fire(384, 48, 192, 192),
                Fire(384, 64, 256, 256),
                Fire(512, 64, 256, 256),
            )
        else:
            # FIXME: Is this needed? SqueezeNet should only be called from the
            # FIXME: squeezenet1_x() functions
            # FIXME: This checking is not done for the other models
            raise ValueError("Unsupported SqueezeNet version {version}:"
                             "1_0 or 1_1 expected".format(version=version))

        # Final convolution is initialized differently from the rest
        final_conv = nn.Conv2D(512, self.num_classes, kernel_size=1)
        self.classifier = nn.Sequential(
            nn.Dropout(p=0.5),
            final_conv,
            nn.ReLU(),
            nn.AdaptiveAvgPool2D((1, 1))
        )



    def forward(self, x):
        x = self.features(x)
        x = self.classifier(x)
        return paddle.flatten(x, 1)

4. Print out the parameter information of the model

import paddle
model = SqueezeNet("1_0", num_classes=4)
params_info = paddle.summary(model,(1, 3, 224, 224))
print(params_info)
``
![](https://img-blog.csdnimg.cn/img_convert/8d8ee9fd5b9b509c234a21fc15d080de.png)

## 5.模型训练

```python
epoch_num = 60 #训练轮数
learning_rate = 0.0001 #学习率


val_acc_history = []
val_loss_history = []


def train(model):
    print('start training ... ')
    # turn into training mode
    model.train()

    opt = paddle.optimizer.Adam(learning_rate=learning_rate,
                                parameters=model.parameters())

    for epoch in range(epoch_num):
        acc_train = []
        for batch_id, data in enumerate(train_loader()):
            x_data = data[0]
            y_data = paddle.to_tensor(data[1],dtype="int64")
            y_data = paddle.unsqueeze(y_data, 1)
            logits = model(x_data)
            loss = F.cross_entropy(logits, y_data)
            acc = paddle.metric.accuracy(logits, y_data)
            acc_train.append(acc.numpy())
            if batch_id % 100 == 0:
                print("epoch: {}, batch_id: {}, loss is: {}".format(epoch, batch_id, loss.numpy()))
                avg_acc = np.mean(acc_train)
                print("[train] accuracy: {}".format(avg_acc))
            loss.backward()
            opt.step()
            opt.clear_grad()
        
        # evaluate model after one epoch
        model.eval()
        accuracies = []
        losses = []
        for batch_id, data in enumerate(test_loader()):
            x_data = data[0]
            y_data = paddle.to_tensor(data[1],dtype="int64")
            y_data = paddle.unsqueeze(y_data, 1)

            logits = model(x_data)
            loss = F.cross_entropy(logits, y_data)
            acc = paddle.metric.accuracy(logits, y_data)
            accuracies.append(acc.numpy())
            losses.append(loss.numpy())

        avg_acc, avg_loss = np.mean(accuracies), np.mean(losses)
        print("[test] accuracy/loss: {}/{}".format(avg_acc, avg_loss))
        val_acc_history.append(avg_acc)
        val_loss_history.append(avg_loss)
        model.train()

train(model)
paddle.save(model.state_dict(), "model.pdparams")

6. Results Visualization

import matplotlib.pyplot as plt
#隐藏警告
import warnings
warnings.filterwarnings("ignore")               #忽略警告信息

epochs_range = range(epoch_num)

plt.figure(figsize=(12, 3))
plt.subplot(1, 2, 1)

plt.plot(epochs_range, val_acc_history, label='Val Accuracy')
plt.legend(loc='lower right')
plt.title('Val Accuracy')

plt.subplot(1, 2, 2)
plt.plot(epochs_range, val_loss_history, label='Val Loss')
plt.legend(loc='upper right')
plt.title('Val Loss')
plt.show()

insert image description here

7. Display of individual prediction results

data_transform = transforms.Compose(
    [transforms.ToTensor(),
     transforms.Resize((32, 32)),
     transforms.Normalize((0.485, 0.456, 0.406), (0.229, 0.224, 0.225))])

img = Image.open("/home/aistudio/work/dataset/Bananaquit/008.jpg")
plt.imshow(img)
image=data_transform(img)
plt.rcParams['font.sans-serif']=['FZHuaLi-M14S']
name=['风头鹦鹉','黑燕鸥类','黑喉树莺','蕉林莺']
image=paddle.reshape(image,[1,3,32,32])
model.eval()
predict=model(image)
print(predict.numpy()) 
plt.title(name[predict.argmax(1)])
plt.show()

insert image description here

Summarize

SqueezeNet is a lightweight convolutional neural network designed to minimize model size and computational resource consumption while maintaining high accuracy. The following is a summary of SqueezeNet:

  1. Lightweight Design: SqueezeNet adopts a special structure, namely "Fire module", to extract rich features by using fewer parameters. This enables SqueezeNet to have a smaller model size compared to other deep networks.

  2. Parameter compression: SqueezeNet reduces the number of parameters by using a 1x1 convolution kernel, and uses channel compression to reduce the amount of calculation. Such a design makes SqueezeNet perform well in environments with limited computing resources.

  3. High Accuracy: Although SqueezeNet is a lightweight network, it still provides relatively high accuracy while keeping the model small. Through reasonable design, SqueezeNet can effectively extract and utilize the feature information of images.

  4. Applicable scenarios: Due to its miniaturization, SqueezeNet is especially suitable for use in environments with limited resources, such as mobile devices and embedded systems. It enables computationally intensive tasks such as image classification, object detection, and image segmentation.

In general, SqueezeNet is a lightweight network that minimizes model size and computational resource consumption while maintaining high accuracy. Its design and parameter compression strategy make it a powerful choice for image processing tasks in resource-constrained environments.

Guess you like

Origin blog.csdn.net/m0_63007797/article/details/131626822