Part1 paper reading and video learning
1 MobileNet V1&V2
1.1 Network structure
The traditional convolutional neural network has a large memory requirement and a large amount of calculation, which makes it impossible to run on mobile devices and embedded devices. MobileNet is a lightweight CNN network focused on mobile and embedded devices. Compared with traditional convolutional neural networks, it greatly reduces model parameters and computation while reducing accuracy. MobileNet V1 is more accurate than VGG. The rate has dropped by 0.9%, and the parameters are only 1/32 of VGG.
The MobileNetV1 network structure is as follows:
V1 network highlights :
- Depthwise Convolution (also known as DW convolution , which greatly reduces the amount of computation and parameters)
The convolution kernel channel of traditional convolution = input feature matrix channel, output feature matrix channel = number of convolution kernels, while the convolution kernel channel of DW convolution is 1, input feature matrix channel = number of convolution kernels = output features Matrix channels.
Depthwise Separable Convolution = DW Convolution + PW Convolution (Pointwise Conv)
PW convolution is an ordinary convolution with a convolution kernel size of 1. In theory, the calculation amount of ordinary convolution is 8 to 9 times that of CW+PW (here, the default input matrix and output matrix are the same size)
- Added two artificially set hyperparameters α, β, α is Width Multiplier, used to control the number of convolution kernels, β is Resolution Multiplier, used to control the size of the input image
It can be seen that by appropriately reducing the size of the input image, a small amount of parameters can be achieved with a small change in the accuracy rate. But in actual use, DW convolution does not play a role in most cases. To solve this problem, MobileNetV2 is proposed. V2 has higher accuracy and a smaller model.
V2 network highlights:
- Inverted Residuals ( inverted residual structure )
The residual structure is to reduce the dimension first and then increase the dimension, use 3*3 convolution in the middle, and the activation function is ReLU; while the inverted residual structure is to increase the dimension first and then reduce the dimension, use DW convolution in the middle, and the activation function is ReLU6. ReLU6(x)=min(max(x,0),6), which makes the value of the activation function not exceed 6, which is more suitable for mobile devices and avoids the loss of precision caused by numerical overflow. The inverted residual structure is shown in the figure. When stride=1 and the shape of the input feature matrix and the output feature matrix are the same, there is a shortcut connection.
- Linear Bottlenecks
The last layer of 1*1 convolution of V2's inverted residual structure uses a linear activation function, because ReLU will cause a lot of loss of low-dimensional feature information.
The V2 network structure and parameters are shown in the figure, where t is the extended phonetic character, c is the channel that outputs the depth of the feature matrix, n is the number of repetitions of the bottleneck, s is the step size of the first layer, and the step size of other layers is 1:
1.2 Build MobileNet V2 based on PyTorch
Code link: (colab)MobileNet
import torch
from torch import nn
# V2
def _make_divisible(ch,divisor=8,min_ch=None):
if min_ch is None:
min_ch = divisor
new_ch = max(min_ch,int(ch*divisor/2)//divisor*divisor)
# 确保向下取整时不会超过10%
if new_ch<0.9*ch:
new_ch+=diviser
return new_ch
class ConvBNReLU(nn.Sequential):
# group为1是普通卷积,group为输入特征矩阵的深度(in_channel)是DW卷积
def __init__(self,in_channel,out_channel,kernel_size=3,stride=1,groups=1):
padding = (kernel_size-1)//2
super(ConvBNReLU,self).__init__(
nn.Conv2d(in_channel,out_channel,kernel_size,stride,padding,groups=groups,bias=False),
nn.BatchNorm2d(out_channel),
nn.ReLU6(inplace=True)
)
# 倒残差结构
class InvertedResidual(nn.Module):
# expand_ratio即拓展因子t
def __init__(self,in_channel,out_channel,stride,expand_ratio):
super(InvertedResidual,self).__init__()
# 隐层
hidden_channel = in_channel*expand_ratio
# 是否用shortcut
self.use_shortcut = stride==1 and in_channel==out_channel
layers = []
if expand_ratio!=1:
# 1*1 PW
layers.append(ConvBNReLU(in_channel,hidden_channel,kernel_size=1))
layers.extend([
# 3*3 DW
ConvBNReLU(hidden_channel,hidden_channel,stride=stride,groups=hidden_channel),
# 1*1 PW(Linear)
nn.Conv2d(hidden_channel,out_channel,kernel_size=1,bias=False),
nn.BatchNorm2d(out_channel)
])
self.conv = nn.Sequential(*layers)
def forward(self,x):
if self.use_shortcut:
return x+self.conv(x)
else:
return self.conv(x)
class MobileNetV2(nn.Module):
def __init__(self,num_classes=1000,alpha=1.0,round_nearest=8):
super(MobileNetV2,self).__init__()
block = InsertedResidual
input_channel = _make_divisible(32*alpha,round_nearest)
last_channel = _make_divisible(1280*alpha,round_nearest)
inverted_residual_setting=[
# t,c,n,s
[1,16,1,1],
[6,24,2,2],
[6,32,3,2],
[6,64,4,2],
[6,96,3,1],
[6,160,3,2],
[6,320,1,1],
]
features = []
# conv1 layer
features.append(ConvBNReLU(3,input_channel,stride=2))
for t,c,n,s in inverted_residual_setting:
output_channel = _make_divisible(c*alpha,round_nearest)
for i in range(n):
stride = s if i==0 else 1
features.append(block(input_channel,output_channel,stride,expand_ratio=t))
input_channel = output_channel
features.append(ConvBNReLU(input_channel,last_channel,1))
self.features = nn.Sequential(*features)
# 分类器
self.avgpool = nn.AdaptiveAvgPool2d((1,1))
self.classifier = nn.Sequential(
nn.Dropout(0.2),
nn.Linear(last_channel,num_classes)
)
# 权重初始化
for m in self.modules():
if isinstance(m,nn.Conv2d):
nn.init.kaiming_normal_(m.weight,mode='fan out')
if m.bias is not None:
nn.init.ones_(m.weight)
nn.init.zeros_(m.bias)
elif isinstance(m,nn.BatchNorm2d):
nn.init.ones_(m.weight)
nn.init.zeros_(m.bias)
elif isinstance(m,nn.Linear):
nn.init.normal_(m.weight,0,0.1)
nn.init.zeros_(m.bias)
def forward(self,x):
x = self.features(x)
x = self.avgpool(x)
x = torch.flatten(x,1)
x = self.classifier(x)
return x
2 MobileNet V3
2.1 Network structure
In the ImageNet classification task, V3 is more accurate, more efficient, and faster inference than V2
V3 network highlights:
- update block ( bneck )
Made changes on the basis of the inverted residual structure, added the SE module, and updated the activation function. When stride=1 and input_c=output_c, there is a shortcut connection.
SE is channel attention. It pools each channel of the feature matrix obtained by convolution, and then obtains the output vector through two fully connected layers. The number of nodes in the first fully connected layer is 1/4 of the channel. 1. The number of nodes in the second fully connected layer is channel
- Use NAS search parameters (Neural Architecture Search)
NAS is a neural network optimization algorithm that first defines a set of "building blocks" suitable for our network and then tries to combine these "building blocks" in different ways for training. Through this trial and error method, the NAS algorithm can finally determine which "build fast" and which network configuration can get the best results.
- Redesign time-consuming layer structure
Reduced the number of convolution kernels in the first convolutional layer from 32 to 16, streamlined the Last Stage, reduced the number of Last Stage layers, maintained accuracy, and improved speed.
- The activation function was redesigned
The currently commonly used activation function is swish(x)=x*σ(x), where σ(x) is a sigmoid function, but its calculation and derivation are complicated and unfriendly to the quantization process, so V3 uses h-swish (x)=x*ReLU6(x+3)/6, where ReLU6(x+3)/6 is h-sigmoid.
The V3 network structure is as follows:
2.2 Build MobileNet V2 based on PyTorch
Code link: (colab)MobileNet
import torch
from torch import nn,Tensor
from torch.nn import function as F
from typing import Callable,List,Optional
from functools import partial
# V3
def _make_divisible(ch,divisor=8,min_ch=None):
if min_ch is None:
min_ch = divisor
new_ch = max(min_ch,int(ch*divisor/2)//divisor*divisor)
# 确保向下取整时不会超过10%
if new_ch<0.9*ch:
new_ch+=diviser
return new_ch
class ConvBNActivation(nn.Sequential):
# group为1是普通卷积,group为输入特征矩阵的深度(in_channel)是DW卷积
def __init__(self,
in_planes:int,
out_planes:int,
kernel_size:int=3,
stride:int=1,
groups:int=1,
norm_layer:Optional[Callable[...,nn.Module]]=None,
activation_layer:Optional[Callable[...,nn.Module]]=None):
padding = (kernel_size-1)//2
if norm_layer is None:
norm_layer = nn.BatchNorm2d
if activation_layer is None:
activation_layer = nn.ReLU6
super(ConvBNActivation,self).__init__(nn.Conv2d(
in_channel=in_planes,
out_channel=out_planes,
kernel_size=kernel_size,
stride=stride,
padding=padding,
groups=groups,
bias=False),
norm_layer(out_planes),
activation_layer(inplace=True)))
# 注意力机制模块
class SqueezeExcitation(nn.Module):
def __init__(self,input_c:int,squeeze_factor:int=4):
super(SqueezeExcitation,self).__init__()
squeeze_c = _make_divisible(input_c//squeeze_factor,8)
self.fc1 = nn.Conv2d(input_c,squeeze_c,1)
self.fc2 = nn.Conv2d(squeeze_c,input_c,1)
def forward(self,x:Tensor) -> Tensor:
scale = F.adaptive_avg_pool2d(x,output_size=(1,1))
scale = self.fc1(scale)
scale = F.relu(scale,inplace=True)
scale = self.fc2(scale)
scale = F.hardsigmoid(scale,inplace=True)
return scale*x
# 倒残差结构
class InvertedResidualConfig:
def __init__(self,input_c:int,
kernel:int,
expanded_c:int,
out_c:int,
use_se:bool,
activation:str,
stride:int,
width_multi:float):
self.input_c = self.adjust_channels(input_c,width_multi)
self.kernel = kernel
self.expanded_c = self.adjust_channels(expanded_c,width_multi)
self.out_c = self.adjust_channels(out_c,width_multi)
self.use_se = use_se
self.use_hs = activation=="HS"
self.stride = stride
@staticmethod
def adjust_channels(channels:int,width_multi:float):
return _make_divisible(channels*width_multi,8)
# 倒残差结构
class InvertedResidual(nn.Module):
def __init__(self,cnf:InvertedResidualConfig,
norm_layer:Callable[...,nn.Module]):
super(InvertedResidual,self).__init__()
if cnf.stride not in [1,2]:
raise ValueError("illegal stride value.")
self.use_res_connect = (cnf.stride==1 and cnf.input_c==cnf.out_c)
layers:List[nn.Module] = []
activation_layer = nn.Hardswish if cnf.use_hs else nn.ReLU
if cnf.expanded_c != cnf.input_c:
layer.append(ConvBNActivation(cnf.input_c,
cnf.expanded_c,
kernel_size=1,
norm_layer=norm_layer,
activation_layer=activation_layer))
# DW卷积
layers.append(ConvBNActivation(cnf.expanded_c,
cnf.expanded_c,
kernel_size=cnf.kernel,
stride=cnf.stride,
groups=cnf.expanded_c,
norm_layer=norm_layer,
activation_layer=activation_layer))
if cnf.use_se:
layers.append(SqueezeExcitation(cnf.expanded_c))
layers.append(ConvBNActivation(cnf.expanded_c,
cnf.out_c,
kernel_size=1,
norm_layer=norm_layer,
activation_layer=nn.Identity))
self.block = nn.Sequential(*layers)
self.out_channels = cnf.out_c
def forward(self,x:Tensor) -> Tensor:
result = self.block(x)
if self.use_res_connect:
result += x
return result
class MobileNetV3(nn.Module):
def __init__(self,
inverted_residual_setting:List[InvertedResidualConfig],
last_channel:int,
num_classes:int=1000,
block:Optional[Callable[...,nn.Module]]=None,
norm_layer:Optional[Callable[...,nn.Module]]=None):
super(MobileNetV3,self).__init__()
if not inverted_residual_setting:
raise ValueError("The inverted_residual_setting should not be empty.")
elif not (isinstance(inverted_residual_setting,List) and
all([isinstance(s,InvertedResidualConfig) for s in inverted_residual_setting])):
raise TypeError("The inverted_residual_setting should be List[InvertedResidualConfig].")
if block is None:
block = InsertedResidual
if norm_layer is None:
norm_layer = partial(nn.BatchNorm2d,eps=0.001,momentum=0.01)
layers:List[nn.Module] = []
# 第一层
firstconv_output_c = inverted_residual_setting[0].input_c
layers.append(ConvBNActivation(3,firstconv_output_c,
kernel_size=3,
stride=2,
norm_layer=norm_layer,
activation_layer=nn.Hardswish))
# 倒残差模块
for cnf in inverted_residual_setting:
layers.append(block(cnf,norm_layer))
# 最后几层
lastconv_input_c = inverted_residual_setting[-1].out_c
lastconv_output_c = 6*lastconv_input_c
layers.append(ConvBNActivation(lastconv_input_c,
lastconv_output_c,
kernel_size=1,
norm_layer=norm_layer,
avtivation_layer=nn.Hardswish))
self.features = nn.Sequential(*layers)
# 分类器
self.avgpool = nn.AdaptiveAvgPool2d(1)
self.classifier = nn.Sequential(
nn.Linear(lastconv_output_c,last_channel),
nn.Hardswish(inplace=True)
nn.Dropout(p=0.2,inplace=True),
nn.Linear(last_channel,num_classes)
)
# 权重初始化
for m in self.modules():
if isinstance(m,nn.Conv2d):
nn.init.kaiming_normal_(m.weight,mode='fan out')
if m.bias is not None:
nn.init.zeros_(m.bias)
elif isinstance(m,(nn.BatchNorm2d,nn.GroupNorm)):
nn.init.ones_(m.weight)
nn.init.zeros_(m.bias)
elif isinstance(m,nn.Linear):
nn.init.normal_(m.weight,0,0.01)
nn.init.zeros_(m.bias)
def _forward_impl(self,x:Tensor) -> Tensor:
x = self.features(x)
x = self.avgpool(x)
x = torch.flatten(x,1)
x = self.classifier(x)
return x
def mobilenet_v3_large(num_classes:int=1000,reduced_tail:bool=False) -> MobileNetV3:
width_multi = 1.0
bneck_conf = partial(InvertedResidualConfig,width_multi=width_multi)
adjust_channels = partial(InvertedResidualConfig.adjust_channels,width_multi=width_multi)
reduce_divider = 2 if reduced_tail else 1
inverted_residual_setting = [
# input_c,kernel,expanded_c,out_c,use_se,activation,stride
bneck_conf(16,3,16,16,False,"RE",1),
bneck_conf(16,3,64,24,False,"RE",2), # C1
bneck_conf(24,3,72,24,False,"RE",1),
bneck_conf(24,5,72,40,True,"RE",2), # C2
bneck_conf(40,5,120,40,True,"RE",1),
bneck_conf(40,5,120,40,True,"RE",1),
bneck_conf(40,3,240,80,False,"HS",2), # C3
bneck_conf(80,3,200,80,False,"HS",1),
bneck_conf(80,3,184,80,False,"HS",1),
bneck_conf(80,3,184,80,False,"HS",1),
bneck_conf(80,3,480,112,True,"HS",1),
bneck_conf(112,3,672,112,True,"HS",1),
bneck_conf(112,5,672,160//reduce_divider,True,"HS",2), # C4
bneck_conf(160//reduce_divider,5,960//reduce_divider,160//reduce_divider,True,"HS",1),
bneck_conf(160//reduce_divider,5,960//reduce_divider,160//reduce_divider,True,"HS",1)]
last_channel = adjust_channels(1280//reduce_divider) # C5
return MobileNetV3(inverted_residual_setting=inverted_residual_setting,
last_channel=last_channel,
num_classes=num_classes)
def mobilenet_v3_small(num_classes:int=1000,reduced_tail:bool=False) -> MobileNetV3:
width_multi = 1.0
bneck_conf = partial(InvertedResidualConfig,width_multi=width_multi)
adjust_channels = partial(InvertedResidualConfig.adjust_channels,width_multi=width_multi)
reduce_divider = 2 if reduced_tail else 1
inverted_residual_setting = [
# input_c, kernel, expanded_c, out_c, use_se, activation, stride
bneck_conf(16,3,16,16,True,"RE",2), # C1
bneck_conf(16,3,72,24,False,"RE",2), # C2
bneck_conf(24,3,88,24,False,"RE",1),
bneck_conf(24,5,96,40,True,"HS",2), # C3
bneck_conf(40,5,240,40,True,"HS",1),
bneck_conf(40,5,240,40,True,"HS",1),
bneck_conf(40,5,120,48,True,"HS",1),
bneck_conf(48,5,144,48,True,"HS",1),
bneck_conf(48,5,288,96//reduce_divider,True,"HS",2), # C4
bneck_conf(96//reduce_divider,5,576//reduce_divider,96//reduce_divider,True,"HS",1),
bneck_conf(96//reduce_divider,5,576//reduce_divider,96//reduce_divider,True,"HS",1)]
last_channel = adjust_channels(1024//reduce_divider) # C5
return MobileNetV3(inverted_residual_setting=inverted_residual_setting,
last_channel=last_channel,
num_classes=num_classes)
3 SEC
Related Links: SENet
3.1 Basic principles of network
In order to improve the performance of the neural network, SENet (Squeeze-and-Excitation Networks) considers starting from the relationship between feature channels, and explicitly shows the dependency between feature channels. SENet adopts the "feature recalibration" strategy, which automatically obtains the importance of each feature channel through learning, and then suppresses the less useful features and highlights the useful features according to the importance.
SENet has two key operations, one is Squeeze and the other is Excitation, hence the name. The important structure of SENet is the SE module, and the schematic diagram of the SE module is shown in the figure:
The SE module has three operations:
- Squeeze operation: perform feature compression along the spatial dimension, transform the two-dimensional feature channel into a real number with a global receptive field, and ensure that the output dimension matches the number of input feature channels, so that the layer close to the input can also obtain the global receptive field. .
- Excitation operation: Similar to the mechanism of the gate of the cyclic neural network, weights are generated for each feature channel. The weight parameters are obtained through learning and can show the correlation between feature channels.
- Reweight operation: Weight the obtained weight parameters to the previous features in the form of multiplication to realize feature recalibration in the channel dimension.
The specific structure of the SE module is as follows. It can be embedded in a variety of network structures to play a role, easy to deploy, and does not need to introduce new functions or layers, which is conducive to improving computational complexity:
Here, the global draw pooling is first used as the Squeeze operation, and then two FCs are used to form the Bottleneck structure for Excitation. In Excitation, the feature dimension is first reduced to 1/16, and then returned to the original dimension through FC after ReLU. Compared with only one FC, this processing has more nonlinearity and can reduce the amount of parameters. The final Sigmoid enables the final weight of 0~1 to be obtained, and then normalized by scale, and the weight is weighted to the corresponding feature.
3.2 Code
import torch
import torch.nn as nn
import torch.nn.functional as F
import torchvision
import torchvision.transforms as transforms
import matplotlib.pyplot as plt
import numpy as np
import torch.optim as optim
class BasicBlock(nn.Module):
def __init__(self,in_channels,out_channels,stride=1):
super(BasicBlock,self).__init__()
self.conv1 = nn.Conv2d(in_channels,out_channels,kernel_size=3,stride=stride,padding=1,bias=False)
self.bn1 = nn.BatchNorm2d(out_channels)
self.conv2 = nn.Conv2d(out_channels,out_channels,kernel_size=3,stride=1,padding=1,bias=False)
self.bn2 = nn.BatchNorm2d(out_channels)
# shortcut的输出维度和输出不一致时,用1*1的卷积来匹配维度
self.shortcut = nn.Sequential()
if stride!=1 or in_channels!=out_channels:
self.shortcut = nn.Sequential(nn.Conv2d(in_channels,out_channels,kernel_size=1,stride=stride,bias=False),nn.BatchNorm2d(out_channels))
# excitation
self.fc1 = nn.Conv2d(out_channels,out_channels//16,kernel_size=1)
self.fc2 = nn.Conv2d(out_channels//16,out_channels,kernel_size=1)
#定义网络结构
def forward(self, x):
# 进行两次卷积得到压缩
out = F.relu(self.bn1(self.conv1(x)))
out = self.bn2(self.conv2(out))
# Squeeze
w = F.avg_pool2d(out,out.size(2))
# Excitation
w = F.relu(self.fc1(w))
w = F.sigmoid(self.fc2(w))
# 加权
out = out*w
# 加上浅层特征图
out += self.shortcut(x)
out = F.relu(out)
return out
class SENet(nn.Module):
def __init__(self):
super(SENet,self).__init__()
self.num_classes = 10
self.in_channels = 64
# 用64*3*3的卷积核
self.conv1 = nn.Conv2d(3,64,kernel_size=3,stride=1,padding=1,bias=False)
self.bn1 = nn.BatchNorm2d(64)
# BasicBlock
# 每个卷积层需要2个block块
self.layer1 = self._make_layer(BasicBlock,64,2,stride=1)
self.layer2 = self._make_layer(BasicBlock,128,2,stride=2)
self.layer3 = self._make_layer(BasicBlock,256,2,stride=2)
self.layer4 = self._make_layer(BasicBlock,512,2,stride=2)
self.linear = nn.Linear(512,self.num_classes)
#实现卷积
#blocks为大layer中的残差块数
#定义每一个layer有几个残差块,resnet18是2,2,2,2
def _make_layer(self,block,out_channels,blocks,stride):
strides = [stride]+[1]*(blocks-1)
layers = []
for stride in strides:
layers.append(block(self.in_channels,out_channels,stride))
self.in_channels = out_channels
return nn.Sequential(*layers)
def forward(self, x):
out = F.relu(self.bn1(self.conv1(x)))
out = self.layer1(out)
out = self.layer2(out)
out = self.layer3(out)
out = self.layer4(out)
out = F.avg_pool2d(out,4)
out = out.view(out.size(0),-1)
out = self.linear(out)
return out
Part2 code homework
2.1 2D convolution and 3D convolution
Hyperspectral images are three-dimensional volumetric data that contain two spatial dimensions and one spectral dimension. For hyperspectral images, two-dimensional convolution can extract spatial features, but not spectral features. Three-dimensional convolution can extract spatial and spectral features at the same time, which is beneficial to improve classification accuracy, but the calculation is more complicated than two-dimensional convolution. Therefore, the paper "HybridSN: Exploring 3D-2D CNN Feature Hierarchy for Hyperspectral Image Classification" combines the advantages of two-dimensional convolution and three-dimensional convolution, first uses three-dimensional convolution, then two-dimensional convolution, and finally connects the classifier. It can not only give full play to the advantages of three-dimensional convolution, fully extract features, but also avoid the excessive use of three-dimensional convolution resulting in complex models.
2.2 HybridSN
The HybridSN structure is as follows:
Code link: (colab)HybirdSN
Network part code supplement:
class_num = 16
class HybridSN(nn.Module):
def __init__(self,num_classes=16):
super(HybridSN,self).__init__()
self.conv1 = nn.Conv3d(1,8,(7,3,3))
self.bn1=nn.BatchNorm3d(8)
self.conv2 = nn.Conv3d(8,16,(5,3,3))
self.bn2=nn.BatchNorm3d(16)
self.conv3 = nn.Conv3d(16,32,(3,3,3))
self.bn3=nn.BatchNorm3d(32)
self.conv4 = nn.Conv2d(576,64,(3,3))
self.bn4=nn.BatchNorm2d(64)
self.drop = nn.Dropout(p=0.4)
self.fc1 = nn.Linear(18496,256)
self.fc2 = nn.Linear(256,128)
self.fc3 = nn.Linear(128,num_classes)
self.relu = nn.ReLU()
# 论文里有加softmax,但本次实验下loss下降特别慢,因此没有使用
self.softmax = nn.Softmax(dim=1)
def forward(self,x):
out = self.relu(self.bn1(self.conv1(x)))
out = self.relu(self.bn2(self.conv2(out)))
out = self.relu(self.bn3(self.conv3(out)))
out = out.view(-1,out.shape[1]*out.shape[2],out.shape[3],out.shape[4])
out = self.relu(self.bn4(self.conv4(out)))
out = out.view(out.size(0),-1)
out = self.fc1(out)
out = self.drop(out)
out = self.relu(out)
out = self.fc2(out)
out = self.drop(out)
out = self.relu(out)
out = self.fc3(out)
# out = self.softmax(out)
return out
# 随机输入,测试网络结构是否通
x = torch.randn(1,1,30,25,25)
net = HybridSN()
y = net(x)
print(y.shape)
Download related resources:
data processing:
train:
Model test:
Write classification results to a file:
first:
97.4536466477302 Kappa accuracy (%)
97.7669376693767 Overall accuracy (%)
94.13603236733026 Average accuracy (%)
precision recall f1-score support
Alfalfa 0.95 0.93 0.94 41
Corn-notill 1.00 0.93 0.96 1285
Corn-mintill 0.94 0.99 0.97 747
Corn 0.97 0.99 0.98 213
Grass-pasture 0.94 0.98 0.96 435
Grass-trees 0.98 1.00 0.99 657
Grass-pasture-mowed 0.92 0.96 0.94 25
Hay-windrowed 0.99 1.00 1.00 430
Oats 0.75 0.50 0.60 18
Soybean-notill 0.99 0.98 0.98 875
Soybean-mintill 0.97 0.99 0.98 2210
Soybean-clean 0.97 0.96 0.97 534
Wheat 0.99 0.96 0.98 185
Woods 1.00 0.99 1.00 1139
Buildings-Grass-Trees-Drives 0.98 0.97 0.98 347
Stone-Steel-Towers 0.88 0.93 0.90 84
accuracy 0.98 9225
macro avg 0.95 0.94 0.95 9225
weighted avg 0.98 0.98 0.98 9225
[[ 38 0 0 0 0 2 1 0 0 0 0 0 0 0
0 0]
[ 1 1200 23 1 5 0 0 0 1 7 43 1 0 1
1 1]
[ 0 0 740 0 3 0 0 0 0 0 0 3 1 0
0 0]
[ 0 0 3 210 0 0 0 0 0 0 0 0 0 0
0 0]
[ 0 0 2 0 428 1 0 0 0 0 4 0 0 0
0 0]
[ 0 0 0 0 0 655 0 0 1 0 1 0 0 0
0 0]
[ 0 0 0 0 1 0 24 0 0 0 0 0 0 0
0 0]
[ 0 0 0 0 0 0 0 429 0 1 0 0 0 0
0 0]
[ 0 0 8 0 0 1 0 0 9 0 0 0 0 0
0 0]
[ 1 0 0 0 9 3 0 0 0 856 3 2 1 0
0 0]
[ 0 2 6 0 1 4 0 1 0 1 2191 0 0 2
2 0]
[ 0 0 2 4 0 0 0 0 1 1 0 514 0 0
2 10]
[ 0 0 0 1 5 0 1 0 0 0 0 0 178 0
0 0]
[ 0 1 0 0 0 0 0 1 0 0 2 1 0 1133
1 0]
[ 0 0 0 0 1 0 0 1 0 0 8 1 0 0
336 0]
[ 0 0 0 0 0 0 0 0 0 0 0 6 0 0
0 78]]
the second time:
97.53900729242903 Kappa accuracy (%)
97.84281842818429 Overall accuracy (%)
95.58275885289382 Average accuracy (%)
precision recall f1-score support
Alfalfa 0.95 0.88 0.91 41
Corn-notill 0.98 0.95 0.97 1285
Corn-mintill 0.97 0.99 0.98 747
Corn 0.96 0.99 0.97 213
Grass-pasture 1.00 0.96 0.98 435
Grass-trees 0.99 0.99 0.99 657
Grass-pasture-mowed 1.00 1.00 1.00 25
Hay-windrowed 0.99 1.00 0.99 430
Oats 0.82 0.78 0.80 18
Soybean-notill 0.98 0.98 0.98 875
Soybean-mintill 0.97 0.99 0.98 2210
Soybean-clean 0.97 0.97 0.97 534
Wheat 0.97 0.98 0.98 185
Woods 0.99 1.00 0.99 1139
Buildings-Grass-Trees-Drives 0.99 0.95 0.97 347
Stone-Steel-Towers 0.90 0.89 0.90 84
accuracy 0.98 9225
macro avg 0.96 0.96 0.96 9225
weighted avg 0.98 0.98 0.98 9225
[[ 36 0 0 1 0 0 0 3 0 0 1 0 0 0
0 0]
[ 2 1219 1 0 0 2 0 0 0 6 51 3 0 1
0 0]
[ 0 1 741 5 0 0 0 0 0 0 0 0 0 0
0 0]
[ 0 0 1 211 0 1 0 0 0 0 0 0 0 0
0 0]
[ 0 1 12 0 416 0 0 0 0 1 4 1 0 0
0 0]
[ 0 2 0 0 0 649 0 0 0 0 2 0 2 2
0 0]
[ 0 0 0 0 0 0 25 0 0 0 0 0 0 0
0 0]
[ 0 0 0 0 0 0 0 429 0 0 0 0 0 0
0 1]
[ 0 0 3 0 0 0 0 0 14 0 0 0 1 0
0 0]
[ 0 2 1 0 0 0 0 0 2 859 3 4 0 1
2 1]
[ 0 3 1 0 1 3 0 0 0 10 2186 0 0 6
0 0]
[ 0 0 3 3 0 0 0 1 1 0 0 519 0 0
1 6]
[ 0 2 0 0 1 0 0 1 0 0 0 0 181 0
0 0]
[ 0 4 0 0 0 0 0 0 0 0 0 0 0 1135
0 0]
[ 0 6 0 0 0 0 0 0 0 1 5 0 2 2
331 0]
[ 0 0 0 0 0 0 0 0 0 0 0 9 0 0
0 75]]
Visualization results:
2.3 Thinking
① Tested twice, and found that the results of the two times are different, because dropout is used in the network, and the discarded nodes are different each time the network is used, and the network is not set to eval during the test.
② To further improve the classification performance of hyperspectral images, consider adding the SE attention mechanism module, or introduce the dimension of the Bottleneck structural adjustment feature.