Openship long-tail data classification based on transfer learning

The code of this article is published in the following warehouse:

https://github.com/VitaminyW/classfier_for_LT_Openship

PS: If you have code questions that need to be answered, you can comment or private message.

1. Introduction

        With the rapid growth of Sentinel-1 Synthetic Aperture Radar (SAR) data, how to utilize Sentinel-1 images and achieve effective and robust ocean surveillance is a crucial issue [1]. Recently, deep learning has been widely used in the field of computer vision, such as image classification [2,3], object detection [4,5], etc. Combining SAR data with the work of deep learning in computer vision can help us achieve ocean monitoring more simply. This paper aims to help ocean monitoring by converting SAR data into image files and using deep learning to classify different ships in OpenSARShip. It is worth noting that the number of ships of each category in the data set used in this paper is extremely uneven. The largest proportion of cargo ships contains 8240 samples, while the smallest proportion has only 2 samples. Overall, the number of samples in each category presents the LTD phenomenon [1], as shown in the figure below.

Figure 1. Display of the number of long-tail distribution data categories

        A common challenge when learning with LTD data is that the large (or head) class dominates the training process. The learned classification models tend to perform better on these classes and significantly worse for classes with few samples (or tail classes). In order to better realize the ship classification problem on LTD data, this paper proposes a solution that combines ResNet's pre-training parameters and sample uniform sampling.

2. Method

        Due to the small number of samples in the Mini-OpenSARShip dataset used in this paper, it may not be possible to obtain a better feature representation if the image feature representation is learned through this dataset. Inspired by migration learning [6], this paper uses the large-scale The Resnet model trained on the data set is used as a model to represent the task, and the classifier is trained on Mini-OpenSARShip. It is worth noting that in the training process, in order to avoid the classifier from overfitting the head category, this paper uses sample balanced sampling The method resamples the data, that is, the probability of each sample being sampled is 1C , where C is the number of categories.

2.1 Resnet

        ResNet[3] is a network proposed by He et al. on ISLVRC and COCO in 2016. This model effectively solves the problem that the performance of the network does not increase or even declines with the increase of depth. ResNet is cleverly implemented in the model. Adding the method of residual connection makes the deep network have better performance than the shallow network.

        The concept of residual learning is mainly to propose a residual network block, as shown in Figure 2, assuming that a certain neural network input x originally expected to output H(x) , if x is directly connected from the input to the output, then The learning goal is to change and become the residual of the two, which is F(x)=H(x) - x . The two weights in the figure learn the difference between the original network block result and the input feature.

        The advantage of using residual connections is that residual learning learns less content, which is simpler than the original need to learn the entire image. At the same time, residual learning ensures that the performance of the network will only be improved without loss, because even if the content of residual learning is zero, it is equivalent to the identity mapping between convolutional layers, which has no effect on network performance. Influence, and when the content learned by the residual is not zero, the network will learn new content from it, which will help to improve network performance.

       Figure 2. Residual module

         This article uses the structure of ResNet50. In the feature extraction structure, the feature extraction part is composed of two basic blocks, the convolution block and the identity block. The identity block is used to process the network when the input and output dimensions are the same. , the main function is to deepen the network, as shown in Figure 3; the convolution block is mainly used to deal with the network when the input and output are different, the convolution layer on the residual side can achieve this, and its function is to change the network Dimensions, as shown in Figure 4.

   

Figure 3. Identity block Figure 4. Convolution block

        The overall structure of ResNet50 is shown in Figure 5. Taking an input image of 600×600 as an example, it first goes through zero padding to avoid the loss of image edge information, and then goes through convolution with a step size of 2, standardizes data, and adds nonlinearity and maximum pooling to the activation function layer. After the downsampling of the layer, it goes through the convolution block and the 2, 3, and 5 identity blocks respectively.

        

Figure 5 ResNet50 feature size map

  2.2 Category Balanced Sampling Strategy

         In order to avoid overfitting the head category and underfitting the tail category when learning the classification decision boundary, this paper uses a category balanced sampling strategy to load data, that is, first uniformly sample the category, and then evenly sample one of the sampled categories. Samples are repeated for training.

3. Subsequent optimization scheme

        By decoupling the LTD recognition task into representation learning and decision boundary learning tasks, using ResNet trained in the ImageNet dataset as the image representation extractor, and combining the category sample uniform sampling strategy to learn the decision boundary, the test set can reach 56.28% accuracy. However, since OpenSARShip's own image is quite different from the natural image in ImageNet, if only network layer for extracting the underlying features in ResNet is frozen , and the training of representation learning is added, the method may achieve better performance.

 

references:

 

[1] Huang L, Liu B, Li B, et al. OpenSARShip: A dataset dedicated to Sentinel-1 ship interpretation[J]. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 2017, 11(1): 195-208.

[2] Sellami A, Tabbone S. Deep neural networks-based relevant latent representation learning for hyperspectral image classification[J]. Pattern Recognition, 2022, 121: 108224.

[3] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 770-778.

[4] Wang C Y, Bochkovskiy A, Liao H Y M. Scaled-yolov4: Scaling cross stage partial network[C]//Proceedings of the IEEE/cvf conference on computer vision and pattern recognition. 2021: 13029-13038.

[5] Sun P, Zhang R, Jiang Y, et al. Sparse r-cnn: End-to-end object detection with learnable proposals[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021: 14454-14463.

[6] Pan S J, Yang Q. A survey on transfer learning[J]. IEEE Transactions on knowledge and data engineering, 2010, 22(10): 1345-1359.

Guess you like

Origin blog.csdn.net/YmgmY/article/details/128720977