[Distance Attention Residual Network: Super Score]

DARN: Distance Attention Residual Network for Lightweight Remote-Sensing Image Superresolution

(DARN: Distance Attention Residual Network for lightweight remote sensing image super-resolution)

The application of single image super-resolution technology in the field of remote sensing is of great significance. Although the SISR method based on convolutional neural network (CNN) has achieved good results, it is difficult to deploy in real remote sensing tasks due to the large size and slow speed of the model. In this article, we propose a compact and efficient distance attention residual network (DARN) to achieve a better trade-off between model accuracy and complexity. Distance Attention Residual Connected Block (DARCB), the core component of DARN, uses multi-level feature aggregation to learn more accurate feature representations. The main branch of DARCB adopts Shallow Residual Blocks (SRB) to flexibly learn residual information to ensure the robustness of the model. We also propose a distance attention block (DAB) as a bridge between the main branch and the DARCB branch; DAB can effectively mitigate the loss of detailed features during deep CNN extraction process. Experimental results on two remote sensing and five super-resolution benchmark datasets show that DARN achieves a better compromise between performance and model complexity than existing methods. Furthermore, DARN achieves an optimal solution compared to state-of-the-art lightweight remote sensing SISR methods in terms of parameter volume, computational effort, and inference speed.

INTRODUCTION

Super-resolution (SR) reconstruction refers to constructing a nonlinear mapping relationship between a pair of high-resolution and low-resolution images. Single image super-resolution (SISR), as the most representative low-level vision task, has been widely studied. The SISR algorithm can obtain high-definition images, which makes it make great contributions in various fields such as military, industry, aerospace, and remote sensing. However, data collection in the field of remote sensing is hampered by long distances, wide viewing angles, and optical hardware equipment. Even with advanced acquisition equipment, it is difficult to obtain high-definition images that meet mission needs. Therefore, it is of great significance to study SISR algorithms that can restore high-frequency information from remote sensing images.
Since the first pioneering SISR method SRCNN was proposed, end-to-end mapping of low-resolution images to high-resolution images has been incorporated into SR reconstruction methods, but new solutions inevitably leave some problems that must be solved Defects. For example, the unreasonable design of convolution kernels and nonlinear mapping results in very slow network inference. FSRCNN derives some optimization methods that rely on different algorithms based on the SRCNN framework. ESPCN improves reconstruction accuracy by replacing traditional interpolation upsampling with sub-pixel convolution functions. Deeper convolutional neural network (CNN) architectures have been shown to improve model performance. The dense residual block proposed by [11] can alleviate the local information loss problem caused by long-distance residuals.
Based on previous explorations of CNN, SISR theory has gradually matured. However, these methods, which always try to improve model accuracy by increasing model capacity, are difficult to show application value in actual tasks. In particular, the large resolution characteristics of remote sensing images often cause the inference speed of commonly used SISR algorithms to be extremely slow. Therefore, how to design a lightweight, efficient, and accurate SISR model has become an insurmountable problem in the SR field.
From the perspective of trying to reduce model parameters, DRCN and DRRN adopt a recursive approach to increase model parameter sharing, but recursive loss requires a deeper CNN for information compensation, which makes it forced to reduce the inference speed of the model. CAUN-M attempts to remove or reduce feature redundancy and speed up model inference through parameter pruning, but PSNR suffers a large accuracy loss. Obviously, in order to achieve an effective compromise between the accuracy, speed and parameter amount of the model, the expressive ability of features must be enhanced within a limited parameter amount range. Therefore, by letterThe information distillation method proposed by Information Distillation Network (IDN) can achieve a moderate trade-off by splitting feature channels to reduce feature redundancy. However, the simple channel splitting method limits the performance of the model in extracting effective features due to the loss of part of the deep feature information . The heterogeneous structure adopted by LESRCNN can improve model reconstruction capabilities by flexibly combining low-frequency features and high-frequency features. MADNet accelerates model inference through a multilateral residual module similar to inception. However, this multi-branch inference solution also suffers from the disadvantage of a large proportion of model parameters. In the SR field, FeNet is inspired by the idea of ​​IDN channel splitting and builds a lightweight LLB module. In order to achieve the purpose of enhancing the expression of channel characteristics, the LLB module uses a channel attention mechanism to construct upper and lower branch information communication, while channel splitting reduces model parameters. However, another problem arises during the information distillation process. That is, when splitting a channel, the side branches generated during the distillation process always leave defects that cannot effectively extract depth features .Insert image description here

This paper constructs a novel lightweight SR network, namely the Distance Attention Residual Network (DARN), to solve the above problems. DARN improves the network's reconstruction ability by strengthening the representation of feature channel information and introducing effective attention modules. The Distance Attention Residual Connected Block (DARCB) we construct is the core component of DARN. DARCB explicitly splits the input features into two branches through feature refinement convolution. Among them, the features after branch refinement are retained, and the main branch is further refined and extracted through the CNN module. The CNN used by DARCB is based on convolutional layers, shallow residual blocks (SRB) across connections and end activation units. The most effective solution to enhance feature representation is to reduce channel feature redundancy and mitigate shallow feature loss caused by deep CNNs . This is specifically why we built the Distance Attention Block (DAB). Distance attention means that shallow features can remotely control the feature extraction of each SRB in the main branch through attention . DAB uses the shallow features of branches as prior information , which can effectively alleviate the phenomenon of shallow information loss in backbone depth CNN feature extraction, thereby enhancing the feature expression of the module. Finally, a multi-level fusion mechanism is used to fuse the refined features at each level to reduce the impact of redundant features of the main branch on the module output . Properly introducing the attention module effectively improves the performance of the SR network. Therefore, we introduce an Enhanced Spatial Attention (ESA) module to enhance the capabilities of the model built in this paper. As shown in Figure 1, the proposed DARN method achieves state-of-the-art reconstruction performance when compared with existing lightweight SR networks. In particular, our model DARN-S still has considerable competitive advantages despite being sufficiently lightweight. This is due to the powerful feature expression ability of the DARCB module proposed in this article.
The contributions of this paper are as follows:
1) The DARCB component is designed to use multi-level feature aggregation to enhance feature representation, achieving superior progress compared to simple CNN cascade modules.
2) The constructed DAB module can effectively apply shallow features to suppress the loss of shallow features during the deep CNN feature extraction process.
3) This paper proposes a lightweight image super-resolution reconstruction model DARN, which achieves a good compromise between reconstruction accuracy and reconstruction efficiency.

RELATED WORK

Deep Network for SR

SR tasks have developed rapidly since Dong et al. [5] proposed the seminal work SRCNN, which significantly outperforms traditional methods. As relevant researchers conduct in-depth research on SR tasks, the effectiveness of optimization strategies such as large models, deep convolution, and feature information globalization on model performance has gradually been reflected. [8] achieved significant improvements in SR tasks, which proves that deep convolution can improve model performance. Lim et al. adopted a wider model structure to increase model parameters to achieve better performance. EDPN copies the input image into a sequence and uses deformable convolution to learn the internal self-similarity of the image. Liu et al. introduced window transformers into the SR domain, which can enhance the relevance of image globalization information. Chen et al. combined Swin-Transformer and channel attention mechanisms, and the HAT model proposed by Chen et al. refreshed the state-of-the-art SR performance. These methods have made great progress in performance, but the model parameters are large and the computational cost is high, making them difficult to deploy into practical applications.

Lightweight SR

The stringent requirements for lightweight models in practical tasks prompt researchers to focus on developing more effective SR models. The IDN proposed in [19] can extract the results of feature segmentation through two channels respectively. In [30], the pyramid structure is relied on to gradually reconstruct the high-frequency residual features of the input image, and the use of deconvolution instead of the bilinear interpolation algorithm can also greatly reduce the computational complexity. The authors in [31] chose to abandon direct learning of the mapping between a pair of high- and low-resolution images and instead achieved accelerated inference of the model by converting the SR task into linear regression of multiple basis filters. The Pixel Attention Network (PAN) proposed in [32] adopts a dual-branch architecture approach, which improves the final reconstruction quality with a smaller parameter cost. The multi-level information distillation refinement structure designed in [23] can achieve multi-level feature reuse. Li et al. introduced separable convolutions to achieve more competitive performance with fewer parameters. Unlike other deep learning-based SR models, LAPAR simplifies the SR task into a linear regression task with multiple underlying filters.
Remote sensing images collected from long distances are of poor quality. Therefore, super-resolution reconstruction of remote sensing images is a very meaningful task. LGCNet is the first CNN-based SR model for remote sensing images, which utilizes local and global representations to learn the image residual between HR images and upscaled LR images. SCViT proposes a spatial channel feature preserving model that takes into account the detailed geometric information of high spatial resolution images. TransENet adopts a multi-scale Transformer to aggregate multi-dimensional spatial features while focusing on image spatial self-similarity. The LLB module proposed by FeNeT uses a channel attention mechanism to construct upper and lower branch information communication, and at the same time ensures model lightweight through channel splitting.

NETWORK ARCHITECTURE

Framework View

For the input low-quality satellite remote sensing image I LR , our method is to reconstruct a high-quality image I HR , which should be close to the ground truth I GT . As shown in Figure 2, our DARN mainly consists of four parts: the shallow feature extraction module H map , the deep feature extraction module composed of N DARCB cascades, the multi-level feature fusion block H fusion and the reconstruction module H up .
The input image first passes through the shallow feature extraction block H map to map the low-dimensional image to a high-dimensional space, enriching the representation of image details . Then, a deep feature extraction module composed of multiple DARCBs gradually refines the extracted features. It can be expressed as Insert image description here
Insert image description here
The optimization of lightweight models refers to improving the performance and speed of the model under smaller model parameter constraints. Therefore, fusing features of different depths of the model is an effective way to improve the performance of lightweight models. As shown in Figure 2, the fusion module improves the reuse rate of features by fusing multi-stage features. We use the fused function F final to restore high-quality remote sensing images through the reconstruction module H up . In addition, reasonable application of residual learning is an effective way to improve model performance. The above process can be expressed asInsert image description here

Thinking ofLightweight Structure

Determination of the network model architecture is the first challenge in the model design phase. Model lightweighting is to improve model accuracy as much as possible while ensuring a small number of parameters, low computational complexity, and fast inference speed. Common deep learning architectures mainly include convolution, Transformer and MLP. Transformer and MLP have high computational complexity, so they are not suitable for SR light tasks based on pixel-level calculations. Convolutional inference is fast and has low computational complexity. Reasonable structural design can enable the reel to exhibit excellent performance under limited parameters. Insert image description here
The idea of ​​lightweight structure is shown in Figure 3. In order to limit the number of parameters of the model, we only use three 3 × 3 ordinary convolutions to form the deep feature extraction component . The series connection of three Conv groups and ESA modules forms the baseline of our model design. However, simple convolutional cascades will inevitably bring a large amount of feature redundancy to lightweight models, which inevitably limits the efficiency of component feature extraction. Therefore, we combine knowledge extraction and multi-level feature fusion to design a Feature Extraction Connection (FDC) module . FDC has the following two significant advantages:
1) The way FDC fuses multi-level features within a limited number of parameters improves feature extraction Utilization rate;
2) The distillation operation used by FDC can refine the characteristic information of the channel.
In the fusion stage, the impact of the deep features extracted by the main branch CNN module on the output of the FDC module is reduced. By allocating output weights at each level, the impact of feature redundancy and random errors generated by the main branch on module performance can be effectively reduced.
Feature redundancy and accumulated random errors formed by simple splicing are common problems in current CNN models. In order to improve the feature extraction efficiency of the main branch, this paper designs a DAB module, which uses the input refined features to suppress the loss of detailed features of the entire main branch CNN module . Shallow features contain all the information of the original image. CNN has problems such as feature loss, feature redundancy and error accumulation when extracting deep features. Therefore, shallow feature-supervised CNN can effectively reduce error accumulation during the extraction process. In addition, the attention mechanism in the DAB module can compensate for the loss of information, increase the proportion of effective features, and reduce redundant information. AnSRB was introduced as the main building block of the master branch to keep the network light enough. In addition, replacing the Conv Groups module with SRB can enable the model to learn residual information flexibly and make the model more robust. It can be seen from formula (2) that the input feature of the K-th DARCB is F K-1 and the output feature is F K . DARCB first divides the input feature F K−1 into two paths, the main branch and the branch, for feature extraction. The branch uses feature refinement convolution to retain the original information F LB of the input feature . Insert image description here
Among them, HR is the feature refinement convolution. Secondly, two RSB modules are used to extract depth features, and two DAB modules are used to enhance the efficiency of depth feature extraction. Finally, a convolutional layer is adopted to refine the deep feature F DAB2 . The specific process can be expressed as Insert image description here
where, H S represents the RSB feature extraction module, H Att represents the attention mechanism in DAB, and g represents several convolutional layers. Finally, DARCB fuses multi-level features. Insert image description here
To further improve the representativeness of the model while maintaining the efficiency of the model, we introduce a lightweight ESA block. Therefore, it can be concluded that the output characteristics of the k-th DARN are Insert image description here
Generally speaking, the DARCB proposed in this article only uses three 3 × 3 convolutional layers to extract depth features, which ensures the brightness of the model. DAB's recommendations strengthen feature extraction so that the model can achieve high accuracy while maintaining brightness.

Guess you like

Origin blog.csdn.net/weixin_43690932/article/details/132721298