[OUC Deep Learning Introduction] Week 4 Learning Record: MobileNetV1, V2, V3

Part1 paper reading and video learning

1 MobileNet V1&V2

1.1 Network structure

The traditional convolutional neural network has a large memory requirement and a large amount of calculation, which makes it impossible to run on mobile devices and embedded devices. MobileNet is a lightweight CNN network focused on mobile and embedded devices. Compared with traditional convolutional neural networks, it greatly reduces model parameters and computation while reducing accuracy. MobileNet V1 is more accurate than VGG. The rate has dropped by 0.9%, and the parameters are only 1/32 of VGG.

The MobileNetV1 network structure is as follows:

V1 network highlights :

  • Depthwise Convolution (also known as DW convolution , which greatly reduces the amount of computation and parameters)

The convolution kernel channel of traditional convolution = input feature matrix channel, output feature matrix channel = number of convolution kernels, while the convolution kernel channel of DW convolution is 1, input feature matrix channel = number of convolution kernels = output features Matrix channels.

 Depthwise Separable Convolution = DW Convolution + PW Convolution (Pointwise Conv)

PW convolution is an ordinary convolution with a convolution kernel size of 1. In theory, the calculation amount of ordinary convolution is 8 to 9 times that of CW+PW (here, the default input matrix and output matrix are the same size)

  • Added two artificially set hyperparameters α, β, α is Width Multiplier, used to control the number of convolution kernels, β is Resolution Multiplier, used to control the size of the input image

It can be seen that by appropriately reducing the size of the input image, a small amount of parameters can be achieved with a small change in the accuracy rate. But in actual use, DW convolution does not play a role in most cases. To solve this problem, MobileNetV2 is proposed. V2 has higher accuracy and a smaller model.

V2 network highlights:

  • Inverted Residuals ( inverted residual structure )

The residual structure is to reduce the dimension first and then increase the dimension, use 3*3 convolution in the middle, and the activation function is ReLU; while the inverted residual structure is to increase the dimension first and then reduce the dimension, use DW convolution in the middle, and the activation function is ReLU6. ReLU6(x)=min(max(x,0),6), which makes the value of the activation function not exceed 6, which is more suitable for mobile devices and avoids the loss of precision caused by numerical overflow. The inverted residual structure is shown in the figure. When stride=1 and the shape of the input feature matrix and the output feature matrix are the same, there is a shortcut connection.

  • Linear Bottlenecks

The last layer of 1*1 convolution of V2's inverted residual structure uses a linear activation function, because ReLU will cause a lot of loss of low-dimensional feature information.

The V2 network structure and parameters are shown in the figure, where t is the extended phonetic character, c is the channel that outputs the depth of the feature matrix, n is the number of repetitions of the bottleneck, s is the step size of the first layer, and the step size of other layers is 1:

1.2 Build MobileNet V2 based on PyTorch

Code link: (colab)MobileNet

import torch
from torch import nn

# V2

def _make_divisible(ch,divisor=8,min_ch=None):
  if min_ch is None:
    min_ch = divisor
  new_ch = max(min_ch,int(ch*divisor/2)//divisor*divisor)
  # 确保向下取整时不会超过10%
  if new_ch<0.9*ch:
    new_ch+=diviser
  return new_ch


class ConvBNReLU(nn.Sequential):
  # group为1是普通卷积,group为输入特征矩阵的深度(in_channel)是DW卷积
  def __init__(self,in_channel,out_channel,kernel_size=3,stride=1,groups=1):
    padding = (kernel_size-1)//2
    super(ConvBNReLU,self).__init__(
        nn.Conv2d(in_channel,out_channel,kernel_size,stride,padding,groups=groups,bias=False),
        nn.BatchNorm2d(out_channel),
        nn.ReLU6(inplace=True)
    )

# 倒残差结构
class InvertedResidual(nn.Module):
  # expand_ratio即拓展因子t
  def __init__(self,in_channel,out_channel,stride,expand_ratio):
    super(InvertedResidual,self).__init__()
    # 隐层
    hidden_channel = in_channel*expand_ratio
    # 是否用shortcut
    self.use_shortcut = stride==1 and in_channel==out_channel

    layers = []
    if expand_ratio!=1:
      # 1*1 PW
      layers.append(ConvBNReLU(in_channel,hidden_channel,kernel_size=1))
    layers.extend([
      # 3*3 DW
      ConvBNReLU(hidden_channel,hidden_channel,stride=stride,groups=hidden_channel),
      # 1*1 PW(Linear)
      nn.Conv2d(hidden_channel,out_channel,kernel_size=1,bias=False),
      nn.BatchNorm2d(out_channel)
    ])

    self.conv = nn.Sequential(*layers)

  def forward(self,x):
    if self.use_shortcut:
      return x+self.conv(x)
    else:
      return self.conv(x)


class MobileNetV2(nn.Module):
  def __init__(self,num_classes=1000,alpha=1.0,round_nearest=8):
    super(MobileNetV2,self).__init__()
    block = InsertedResidual
    input_channel = _make_divisible(32*alpha,round_nearest)
    last_channel = _make_divisible(1280*alpha,round_nearest)

    inverted_residual_setting=[
      # t,c,n,s
      [1,16,1,1],
      [6,24,2,2],
      [6,32,3,2],
      [6,64,4,2],
      [6,96,3,1],
      [6,160,3,2],
      [6,320,1,1],
    ]

    features = []
    # conv1 layer
    features.append(ConvBNReLU(3,input_channel,stride=2))
    for t,c,n,s in inverted_residual_setting:
      output_channel = _make_divisible(c*alpha,round_nearest)
      for i in range(n):
        stride = s if i==0 else 1
        features.append(block(input_channel,output_channel,stride,expand_ratio=t))
        input_channel = output_channel

    features.append(ConvBNReLU(input_channel,last_channel,1))
    self.features = nn.Sequential(*features)

    # 分类器
    self.avgpool = nn.AdaptiveAvgPool2d((1,1))
    self.classifier = nn.Sequential(
      nn.Dropout(0.2),
      nn.Linear(last_channel,num_classes)
    )

    # 权重初始化
    for m in self.modules():
      if isinstance(m,nn.Conv2d):
        nn.init.kaiming_normal_(m.weight,mode='fan out')
        if m.bias is not None:
          nn.init.ones_(m.weight)
          nn.init.zeros_(m.bias)
      elif isinstance(m,nn.BatchNorm2d):
        nn.init.ones_(m.weight)
        nn.init.zeros_(m.bias)
      elif isinstance(m,nn.Linear):
        nn.init.normal_(m.weight,0,0.1)
        nn.init.zeros_(m.bias)

  def forward(self,x):
    x = self.features(x)
    x = self.avgpool(x)
    x = torch.flatten(x,1)
    x = self.classifier(x)
    return x

2 MobileNet V3

2.1 Network structure

In the ImageNet classification task, V3 is more accurate, more efficient, and faster inference than V2

V3 network highlights:

  • update block ( bneck )

Made changes on the basis of the inverted residual structure, added the SE module, and updated the activation function. When stride=1 and input_c=output_c, there is a shortcut connection.

 SE is channel attention. It pools each channel of the feature matrix obtained by convolution, and then obtains the output vector through two fully connected layers. The number of nodes in the first fully connected layer is 1/4 of the channel. 1. The number of nodes in the second fully connected layer is channel

  • Use NAS search parameters (Neural Architecture Search)

NAS is a neural network optimization algorithm that first defines a set of "building blocks" suitable for our network and then tries to combine these "building blocks" in different ways for training. Through this trial and error method, the NAS algorithm can finally determine which "build fast" and which network configuration can get the best results.

Convolutional Neural Network Principles_How to Design the Optimal Convolutional Neural Network Architecture? | NAS principle analysis_LHZ5388015210's Blog-CSDN Blog

  • Redesign time-consuming layer structure

Reduced the number of convolution kernels in the first convolutional layer from 32 to 16, streamlined the Last Stage, reduced the number of Last Stage layers, maintained accuracy, and improved speed.

  • The activation function was redesigned

The currently commonly used activation function is swish(x)=x*σ(x), where σ(x) is a sigmoid function, but its calculation and derivation are complicated and unfriendly to the quantization process, so V3 uses h-swish (x)=x*ReLU6(x+3)/6, where ReLU6(x+3)/6 is h-sigmoid.

 The V3 network structure is as follows:

 

2.2 Build MobileNet V2 based on PyTorch

Code link: (colab)MobileNet

import torch
from torch import nn,Tensor
from torch.nn import function as F

from typing import Callable,List,Optional
from functools import partial

# V3

def _make_divisible(ch,divisor=8,min_ch=None):
  if min_ch is None:
    min_ch = divisor
  new_ch = max(min_ch,int(ch*divisor/2)//divisor*divisor)
  # 确保向下取整时不会超过10%
  if new_ch<0.9*ch:
    new_ch+=diviser
  return new_ch


class ConvBNActivation(nn.Sequential):
  # group为1是普通卷积,group为输入特征矩阵的深度(in_channel)是DW卷积
  def __init__(self,
        in_planes:int,
        out_planes:int,
        kernel_size:int=3,
        stride:int=1,
        groups:int=1,
        norm_layer:Optional[Callable[...,nn.Module]]=None,
        activation_layer:Optional[Callable[...,nn.Module]]=None):
    padding = (kernel_size-1)//2
    if norm_layer is None:
      norm_layer = nn.BatchNorm2d
    if activation_layer is None:
      activation_layer = nn.ReLU6
    super(ConvBNActivation,self).__init__(nn.Conv2d(
                      in_channel=in_planes,
                      out_channel=out_planes,
                      kernel_size=kernel_size,
                      stride=stride,
                      padding=padding,
                      groups=groups,
                      bias=False),
                  norm_layer(out_planes),
                  activation_layer(inplace=True)))


# 注意力机制模块
class SqueezeExcitation(nn.Module):
  def __init__(self,input_c:int,squeeze_factor:int=4):
    super(SqueezeExcitation,self).__init__()
    squeeze_c = _make_divisible(input_c//squeeze_factor,8)
    self.fc1 = nn.Conv2d(input_c,squeeze_c,1)
    self.fc2 = nn.Conv2d(squeeze_c,input_c,1)

  def forward(self,x:Tensor) -> Tensor:
    scale = F.adaptive_avg_pool2d(x,output_size=(1,1))
    scale = self.fc1(scale)
    scale = F.relu(scale,inplace=True)
    scale = self.fc2(scale)
    scale = F.hardsigmoid(scale,inplace=True)
    return scale*x


# 倒残差结构
class InvertedResidualConfig:
  def __init__(self,input_c:int,
          kernel:int,
          expanded_c:int,
          out_c:int,
          use_se:bool,
          activation:str,
          stride:int,
          width_multi:float):
    self.input_c = self.adjust_channels(input_c,width_multi)
    self.kernel = kernel
    self.expanded_c = self.adjust_channels(expanded_c,width_multi)
    self.out_c = self.adjust_channels(out_c,width_multi)
    self.use_se = use_se
    self.use_hs = activation=="HS"
    self.stride = stride

  @staticmethod
  def adjust_channels(channels:int,width_multi:float):
    return _make_divisible(channels*width_multi,8)


# 倒残差结构
class InvertedResidual(nn.Module):
  def __init__(self,cnf:InvertedResidualConfig,
          norm_layer:Callable[...,nn.Module]):
    super(InvertedResidual,self).__init__()
    
    if cnf.stride not in [1,2]:
      raise ValueError("illegal stride value.")

    self.use_res_connect = (cnf.stride==1 and cnf.input_c==cnf.out_c)

    layers:List[nn.Module] = []
    activation_layer = nn.Hardswish if cnf.use_hs else nn.ReLU

    if cnf.expanded_c != cnf.input_c:
      layer.append(ConvBNActivation(cnf.input_c,
                    cnf.expanded_c,
                    kernel_size=1,
                    norm_layer=norm_layer,
                    activation_layer=activation_layer))
    # DW卷积
    layers.append(ConvBNActivation(cnf.expanded_c,
                  cnf.expanded_c,
                  kernel_size=cnf.kernel,
                  stride=cnf.stride,
                  groups=cnf.expanded_c,
                  norm_layer=norm_layer,
                  activation_layer=activation_layer))
    
    if cnf.use_se:
      layers.append(SqueezeExcitation(cnf.expanded_c))

    layers.append(ConvBNActivation(cnf.expanded_c,
                    cnf.out_c,
                    kernel_size=1,
                    norm_layer=norm_layer,
                    activation_layer=nn.Identity))
    
    self.block = nn.Sequential(*layers)
    self.out_channels = cnf.out_c

  def forward(self,x:Tensor) -> Tensor:
    result = self.block(x)
    if self.use_res_connect:
      result += x

    return result


class MobileNetV3(nn.Module):
  def __init__(self,
        inverted_residual_setting:List[InvertedResidualConfig],
        last_channel:int,
        num_classes:int=1000,
        block:Optional[Callable[...,nn.Module]]=None,
        norm_layer:Optional[Callable[...,nn.Module]]=None):
    super(MobileNetV3,self).__init__()

    if not inverted_residual_setting:
      raise ValueError("The inverted_residual_setting should not be empty.")
    elif not (isinstance(inverted_residual_setting,List) and
          all([isinstance(s,InvertedResidualConfig) for s in inverted_residual_setting])):
      raise TypeError("The inverted_residual_setting should be List[InvertedResidualConfig].")


    if block is None:
      block = InsertedResidual

    if norm_layer is None:
      norm_layer = partial(nn.BatchNorm2d,eps=0.001,momentum=0.01)

    layers:List[nn.Module] = []
    # 第一层
    firstconv_output_c = inverted_residual_setting[0].input_c
    layers.append(ConvBNActivation(3,firstconv_output_c,
                    kernel_size=3,
                    stride=2,
                    norm_layer=norm_layer,
                    activation_layer=nn.Hardswish))
    
    # 倒残差模块
    for cnf in inverted_residual_setting:
      layers.append(block(cnf,norm_layer))

    # 最后几层
    lastconv_input_c = inverted_residual_setting[-1].out_c
    lastconv_output_c = 6*lastconv_input_c
    layers.append(ConvBNActivation(lastconv_input_c,
                  lastconv_output_c,
                  kernel_size=1,
                  norm_layer=norm_layer,
                  avtivation_layer=nn.Hardswish))
    self.features = nn.Sequential(*layers)

    # 分类器
    self.avgpool = nn.AdaptiveAvgPool2d(1)
    self.classifier = nn.Sequential(
      nn.Linear(lastconv_output_c,last_channel),
      nn.Hardswish(inplace=True)
      nn.Dropout(p=0.2,inplace=True),
      nn.Linear(last_channel,num_classes)
    )

    # 权重初始化
    for m in self.modules():
      if isinstance(m,nn.Conv2d):
        nn.init.kaiming_normal_(m.weight,mode='fan out')
        if m.bias is not None:
          nn.init.zeros_(m.bias)
      elif isinstance(m,(nn.BatchNorm2d,nn.GroupNorm)):
        nn.init.ones_(m.weight)
        nn.init.zeros_(m.bias)
      elif isinstance(m,nn.Linear):
        nn.init.normal_(m.weight,0,0.01)
        nn.init.zeros_(m.bias)

  def _forward_impl(self,x:Tensor) -> Tensor:
    x = self.features(x)
    x = self.avgpool(x)
    x = torch.flatten(x,1)
    x = self.classifier(x)
    return x


def mobilenet_v3_large(num_classes:int=1000,reduced_tail:bool=False) -> MobileNetV3:
    width_multi = 1.0
    bneck_conf = partial(InvertedResidualConfig,width_multi=width_multi)
    adjust_channels = partial(InvertedResidualConfig.adjust_channels,width_multi=width_multi)

    reduce_divider = 2 if reduced_tail else 1

    inverted_residual_setting = [
        # input_c,kernel,expanded_c,out_c,use_se,activation,stride
        bneck_conf(16,3,16,16,False,"RE",1),
        bneck_conf(16,3,64,24,False,"RE",2),  # C1
        bneck_conf(24,3,72,24,False,"RE",1),
        bneck_conf(24,5,72,40,True,"RE",2), # C2
        bneck_conf(40,5,120,40,True,"RE",1),
        bneck_conf(40,5,120,40,True,"RE",1),
        bneck_conf(40,3,240,80,False,"HS",2), # C3
        bneck_conf(80,3,200,80,False,"HS",1),
        bneck_conf(80,3,184,80,False,"HS",1),
        bneck_conf(80,3,184,80,False,"HS",1),
        bneck_conf(80,3,480,112,True,"HS",1),
        bneck_conf(112,3,672,112,True,"HS",1),
        bneck_conf(112,5,672,160//reduce_divider,True,"HS",2),  # C4
        bneck_conf(160//reduce_divider,5,960//reduce_divider,160//reduce_divider,True,"HS",1),
        bneck_conf(160//reduce_divider,5,960//reduce_divider,160//reduce_divider,True,"HS",1)]
    last_channel = adjust_channels(1280//reduce_divider)  # C5

    return MobileNetV3(inverted_residual_setting=inverted_residual_setting,
              last_channel=last_channel,
              num_classes=num_classes)


def mobilenet_v3_small(num_classes:int=1000,reduced_tail:bool=False) -> MobileNetV3:
    
    width_multi = 1.0
    bneck_conf = partial(InvertedResidualConfig,width_multi=width_multi)
    adjust_channels = partial(InvertedResidualConfig.adjust_channels,width_multi=width_multi)

    reduce_divider = 2 if reduced_tail else 1

    inverted_residual_setting = [
        # input_c, kernel, expanded_c, out_c, use_se, activation, stride
        bneck_conf(16,3,16,16,True,"RE",2), # C1
        bneck_conf(16,3,72,24,False,"RE",2),  # C2
        bneck_conf(24,3,88,24,False,"RE",1),
        bneck_conf(24,5,96,40,True,"HS",2), # C3
        bneck_conf(40,5,240,40,True,"HS",1),
        bneck_conf(40,5,240,40,True,"HS",1),
        bneck_conf(40,5,120,48,True,"HS",1),
        bneck_conf(48,5,144,48,True,"HS",1),
        bneck_conf(48,5,288,96//reduce_divider,True,"HS",2),  # C4
        bneck_conf(96//reduce_divider,5,576//reduce_divider,96//reduce_divider,True,"HS",1),
        bneck_conf(96//reduce_divider,5,576//reduce_divider,96//reduce_divider,True,"HS",1)]
    last_channel = adjust_channels(1024//reduce_divider)  # C5

    return MobileNetV3(inverted_residual_setting=inverted_residual_setting,
              last_channel=last_channel,
              num_classes=num_classes)

3 SEC

Related Links: SENet

3.1 Basic principles of network

In order to improve the performance of the neural network, SENet (Squeeze-and-Excitation Networks) considers starting from the relationship between feature channels, and explicitly shows the dependency between feature channels. SENet adopts the "feature recalibration" strategy, which automatically obtains the importance of each feature channel through learning, and then suppresses the less useful features and highlights the useful features according to the importance.

SENet has two key operations, one is Squeeze and the other is Excitation, hence the name. The important structure of SENet is the SE module, and the schematic diagram of the SE module is shown in the figure:

The SE module has three operations:

  1. Squeeze operation: perform feature compression along the spatial dimension, transform the two-dimensional feature channel into a real number with a global receptive field, and ensure that the output dimension matches the number of input feature channels, so that the layer close to the input can also obtain the global receptive field. .
  2. Excitation operation: Similar to the mechanism of the gate of the cyclic neural network, weights are generated for each feature channel. The weight parameters are obtained through learning and can show the correlation between feature channels.
  3. Reweight operation: Weight the obtained weight parameters to the previous features in the form of multiplication to realize feature recalibration in the channel dimension.

The specific structure of the SE module is as follows. It can be embedded in a variety of network structures to play a role, easy to deploy, and does not need to introduce new functions or layers, which is conducive to improving computational complexity:

Here, the global draw pooling is first used as the Squeeze operation, and then two FCs are used to form the Bottleneck structure for Excitation. In Excitation, the feature dimension is first reduced to 1/16, and then returned to the original dimension through FC after ReLU. Compared with only one FC, this processing has more nonlinearity and can reduce the amount of parameters. The final Sigmoid enables the final weight of 0~1 to be obtained, and then normalized by scale, and the weight is weighted to the corresponding feature.

3.2 Code

import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import numpy as np
import torch.optim as optim

class BasicBlock(nn.Module):
  def __init__(self,in_channels,out_channels,stride=1):
    super(BasicBlock,self).__init__()
    self.conv1 = nn.Conv2d(in_channels,out_channels,kernel_size=3,stride=stride,padding=1,bias=False)
    self.bn1 = nn.BatchNorm2d(out_channels)
    self.conv2 = nn.Conv2d(out_channels,out_channels,kernel_size=3,stride=1,padding=1,bias=False)
    self.bn2 = nn.BatchNorm2d(out_channels)

    # shortcut的输出维度和输出不一致时,用1*1的卷积来匹配维度
    self.shortcut = nn.Sequential()
    if stride!=1 or in_channels!=out_channels:
      self.shortcut = nn.Sequential(nn.Conv2d(in_channels,out_channels,kernel_size=1,stride=stride,bias=False),nn.BatchNorm2d(out_channels))

    # excitation
    self.fc1 = nn.Conv2d(out_channels,out_channels//16,kernel_size=1) 
    self.fc2 = nn.Conv2d(out_channels//16,out_channels,kernel_size=1)

  #定义网络结构
  def forward(self, x):
    # 进行两次卷积得到压缩
    out = F.relu(self.bn1(self.conv1(x)))
    out = self.bn2(self.conv2(out))

    # Squeeze
    w = F.avg_pool2d(out,out.size(2))
    
    # Excitation
    w = F.relu(self.fc1(w))
    w = F.sigmoid(self.fc2(w))

    # 加权
    out = out*w 
    # 加上浅层特征图
    out += self.shortcut(x)
    out = F.relu(out)
    return out


class SENet(nn.Module):
  def __init__(self):
    super(SENet,self).__init__()
    self.num_classes = 10
    self.in_channels = 64

    # 用64*3*3的卷积核
    self.conv1 = nn.Conv2d(3,64,kernel_size=3,stride=1,padding=1,bias=False)
    self.bn1 = nn.BatchNorm2d(64)
    # BasicBlock
    # 每个卷积层需要2个block块
    self.layer1 = self._make_layer(BasicBlock,64,2,stride=1)
    self.layer2 = self._make_layer(BasicBlock,128,2,stride=2)
    self.layer3 = self._make_layer(BasicBlock,256,2,stride=2)
    self.layer4 = self._make_layer(BasicBlock,512,2,stride=2)
    
    self.linear = nn.Linear(512,self.num_classes)

  #实现卷积
  #blocks为大layer中的残差块数
  #定义每一个layer有几个残差块,resnet18是2,2,2,2
  def _make_layer(self,block,out_channels,blocks,stride):
    strides = [stride]+[1]*(blocks-1)
    layers = []
    for stride in strides:
      layers.append(block(self.in_channels,out_channels,stride))
      self.in_channels = out_channels
    return nn.Sequential(*layers)

  def forward(self, x):
    out = F.relu(self.bn1(self.conv1(x)))
    out = self.layer1(out)
    out = self.layer2(out)
    out = self.layer3(out)
    out = self.layer4(out)
    out = F.avg_pool2d(out,4)
    out = out.view(out.size(0),-1)
    out = self.linear(out)
    return out

Part2 code homework

2.1 2D convolution and 3D convolution

Hyperspectral images are three-dimensional volumetric data that contain two spatial dimensions and one spectral dimension. For hyperspectral images, two-dimensional convolution can extract spatial features, but not spectral features. Three-dimensional convolution can extract spatial and spectral features at the same time, which is beneficial to improve classification accuracy, but the calculation is more complicated than two-dimensional convolution. Therefore, the paper "HybridSN: Exploring 3D-2D CNN Feature Hierarchy for Hyperspectral Image Classification" combines the advantages of two-dimensional convolution and three-dimensional convolution, first uses three-dimensional convolution, then two-dimensional convolution, and finally connects the classifier. It can not only give full play to the advantages of three-dimensional convolution, fully extract features, but also avoid the excessive use of three-dimensional convolution resulting in complex models.

2.2 HybridSN

The HybridSN structure is as follows:

Code link: (colab)HybirdSN

Network part code supplement:

class_num = 16

class HybridSN(nn.Module):

  def __init__(self,num_classes=16):
    super(HybridSN,self).__init__()
    self.conv1 = nn.Conv3d(1,8,(7,3,3))
    self.bn1=nn.BatchNorm3d(8)

    self.conv2 = nn.Conv3d(8,16,(5,3,3))
    self.bn2=nn.BatchNorm3d(16)

    self.conv3 = nn.Conv3d(16,32,(3,3,3))
    self.bn3=nn.BatchNorm3d(32)

    self.conv4 = nn.Conv2d(576,64,(3,3))
    self.bn4=nn.BatchNorm2d(64)

    self.drop = nn.Dropout(p=0.4)

    self.fc1 = nn.Linear(18496,256)
    self.fc2 = nn.Linear(256,128)
    self.fc3 = nn.Linear(128,num_classes)

    self.relu = nn.ReLU()

    # 论文里有加softmax,但本次实验下loss下降特别慢,因此没有使用
    self.softmax = nn.Softmax(dim=1)

  def forward(self,x):
    out = self.relu(self.bn1(self.conv1(x)))
    out = self.relu(self.bn2(self.conv2(out)))
    out = self.relu(self.bn3(self.conv3(out)))

    out = out.view(-1,out.shape[1]*out.shape[2],out.shape[3],out.shape[4])
    out = self.relu(self.bn4(self.conv4(out)))
    
    out = out.view(out.size(0),-1)
    out = self.fc1(out)
    out = self.drop(out)
    out = self.relu(out)
    out = self.fc2(out)
    out = self.drop(out)
    out = self.relu(out)
    out = self.fc3(out)
    # out = self.softmax(out)
    return out


# 随机输入,测试网络结构是否通
x = torch.randn(1,1,30,25,25)
net = HybridSN()
y = net(x)
print(y.shape)

Download related resources:

data processing:

 train:

 Model test:

Write classification results to a file:

first:


97.4536466477302 Kappa accuracy (%)
97.7669376693767 Overall accuracy (%)
94.13603236733026 Average accuracy (%)

                              precision    recall  f1-score   support

                     Alfalfa       0.95      0.93      0.94        41
                 Corn-notill       1.00      0.93      0.96      1285
                Corn-mintill       0.94      0.99      0.97       747
                        Corn       0.97      0.99      0.98       213
               Grass-pasture       0.94      0.98      0.96       435
                 Grass-trees       0.98      1.00      0.99       657
         Grass-pasture-mowed       0.92      0.96      0.94        25
               Hay-windrowed       0.99      1.00      1.00       430
                        Oats       0.75      0.50      0.60        18
              Soybean-notill       0.99      0.98      0.98       875
             Soybean-mintill       0.97      0.99      0.98      2210
               Soybean-clean       0.97      0.96      0.97       534
                       Wheat       0.99      0.96      0.98       185
                       Woods       1.00      0.99      1.00      1139
Buildings-Grass-Trees-Drives       0.98      0.97      0.98       347
          Stone-Steel-Towers       0.88      0.93      0.90        84

                    accuracy                           0.98      9225
                   macro avg       0.95      0.94      0.95      9225
                weighted avg       0.98      0.98      0.98      9225

[[  38    0    0    0    0    2    1    0    0    0    0    0    0    0
     0    0]
 [   1 1200   23    1    5    0    0    0    1    7   43    1    0    1
     1    1]
 [   0    0  740    0    3    0    0    0    0    0    0    3    1    0
     0    0]
 [   0    0    3  210    0    0    0    0    0    0    0    0    0    0
     0    0]
 [   0    0    2    0  428    1    0    0    0    0    4    0    0    0
     0    0]
 [   0    0    0    0    0  655    0    0    1    0    1    0    0    0
     0    0]
 [   0    0    0    0    1    0   24    0    0    0    0    0    0    0
     0    0]
 [   0    0    0    0    0    0    0  429    0    1    0    0    0    0
     0    0]
 [   0    0    8    0    0    1    0    0    9    0    0    0    0    0
     0    0]
 [   1    0    0    0    9    3    0    0    0  856    3    2    1    0
     0    0]
 [   0    2    6    0    1    4    0    1    0    1 2191    0    0    2
     2    0]
 [   0    0    2    4    0    0    0    0    1    1    0  514    0    0
     2   10]
 [   0    0    0    1    5    0    1    0    0    0    0    0  178    0
     0    0]
 [   0    1    0    0    0    0    0    1    0    0    2    1    0 1133
     1    0]
 [   0    0    0    0    1    0    0    1    0    0    8    1    0    0
   336    0]
 [   0    0    0    0    0    0    0    0    0    0    0    6    0    0
     0   78]]

 the second time:


97.53900729242903 Kappa accuracy (%)
97.84281842818429 Overall accuracy (%)
95.58275885289382 Average accuracy (%)

                              precision    recall  f1-score   support

                     Alfalfa       0.95      0.88      0.91        41
                 Corn-notill       0.98      0.95      0.97      1285
                Corn-mintill       0.97      0.99      0.98       747
                        Corn       0.96      0.99      0.97       213
               Grass-pasture       1.00      0.96      0.98       435
                 Grass-trees       0.99      0.99      0.99       657
         Grass-pasture-mowed       1.00      1.00      1.00        25
               Hay-windrowed       0.99      1.00      0.99       430
                        Oats       0.82      0.78      0.80        18
              Soybean-notill       0.98      0.98      0.98       875
             Soybean-mintill       0.97      0.99      0.98      2210
               Soybean-clean       0.97      0.97      0.97       534
                       Wheat       0.97      0.98      0.98       185
                       Woods       0.99      1.00      0.99      1139
Buildings-Grass-Trees-Drives       0.99      0.95      0.97       347
          Stone-Steel-Towers       0.90      0.89      0.90        84

                    accuracy                           0.98      9225
                   macro avg       0.96      0.96      0.96      9225
                weighted avg       0.98      0.98      0.98      9225

[[  36    0    0    1    0    0    0    3    0    0    1    0    0    0
     0    0]
 [   2 1219    1    0    0    2    0    0    0    6   51    3    0    1
     0    0]
 [   0    1  741    5    0    0    0    0    0    0    0    0    0    0
     0    0]
 [   0    0    1  211    0    1    0    0    0    0    0    0    0    0
     0    0]
 [   0    1   12    0  416    0    0    0    0    1    4    1    0    0
     0    0]
 [   0    2    0    0    0  649    0    0    0    0    2    0    2    2
     0    0]
 [   0    0    0    0    0    0   25    0    0    0    0    0    0    0
     0    0]
 [   0    0    0    0    0    0    0  429    0    0    0    0    0    0
     0    1]
 [   0    0    3    0    0    0    0    0   14    0    0    0    1    0
     0    0]
 [   0    2    1    0    0    0    0    0    2  859    3    4    0    1
     2    1]
 [   0    3    1    0    1    3    0    0    0   10 2186    0    0    6
     0    0]
 [   0    0    3    3    0    0    0    1    1    0    0  519    0    0
     1    6]
 [   0    2    0    0    1    0    0    1    0    0    0    0  181    0
     0    0]
 [   0    4    0    0    0    0    0    0    0    0    0    0    0 1135
     0    0]
 [   0    6    0    0    0    0    0    0    0    1    5    0    2    2
   331    0]
 [   0    0    0    0    0    0    0    0    0    0    0    9    0    0
     0   75]]

 Visualization results:

 2.3 Thinking

① Tested twice, and found that the results of the two times are different, because dropout is used in the network, and the discarded nodes are different each time the network is used, and the network is not set to eval during the test.

② To further improve the classification performance of hyperspectral images, consider adding the SE attention mechanism module, or introduce the dimension of the Bottleneck structural adjustment feature.

Guess you like

Origin blog.csdn.net/qq_55708326/article/details/126081732