[Super score: spectral response function]

Spectral Response Function-Guided Deep Optimization-Driven Network for Spectral Super-Resolution

(Spectral response function-guided deep optimization drives network spectral super-resolution)

Hyperspectral images (HSI) are key to many research efforts. Spectral super-resolution (SSR) is a method used to obtain high spatial resolution (HR) HSI from HR multispectral images. Traditional SSR methods include model-driven algorithms and deep learning. By unfolding variational methods, this paper proposes an optimization-driven convolutional neural network (CNN) with deep spatial spectral priors , resulting in physically interpretable networks. Unlike fully data-driven CNNs, an auxiliary spectral response function (SRF) is used to guide the CNN to group frequency bands with spectral correlation . Furthermore, a channel attention module (CAM) and a reformulated spectral angle mapping loss function are implemented to achieve an efficient reconstruction model. Finally, through experiments on two types of data sets, natural images and remote sensing images, the spectral enhancement effect of this method was verified, and the classification results of remote sensing data sets verified the effectiveness of this method in enhancing information.

INTRODUCTION

High-resolution spectral imaging technology is a technology that uses high-resolution scene radiation to comprehensively explore the spectral characteristics of objects. The processing of hyperspectral images (HSI), such as segmentation, classification, detection and tracking, has received increasing attention due to its rich spectral information. HS imaging is also being developed for numerous applications ranging from remote sensing to medical imaging.
The HS sensor acquires scene radiation with multiple spectral bands in a fine wavelength range. However, when the spectral resolution is high, each detector element senses less energy radiation. The sensor requires long exposure times to obtain an acceptable signal-to-noise ratio for each frequency band. Compared with red-green-blue (RGB) and multispectral images (MSI), HSI usually lacks good spatial resolution . This limitation affects the usability of HSI for applications requiring high spatial resolution (HR). Many researchers have proposed to directly reconstruct HR HSI through image super-resolution (SR) of low spatial resolution (LR) HSI to enhance the spatial details of HSI. Akgun et al. proposed a model that can represent HS observations as weighted linear combinations and use set theoretical methods as a solution. Gu et al. proposed an SR algorithm that uses an indirect method based on spectral decomposition and designed a learning-based SR mapping as a backpropagation neural network. The above method only utilizes LR HSI to reconstruct HR HSI. However, when the ratio between LR and HR is large, poor spatial enhancement is observed.
With the development of detector elements, a large number of sensors are currently designed to achieve good representation of spatial details and temporal changes. However, these sensors only capture three or four spectral bands for very high HR (≤10 m), especially for remote sensing satellites such as Sentinel-2, GaoFen-2, QuickBird and WorldView. Although MSI usually has HR, they cannot fully represent the spectral properties of an object by using only a few spectral channels.
Combining the respective advantages of HSI and MSI, some researchers use HR MSI as auxiliary data to improve the spatial resolution of HSI. Hardie et al. proposed a new maximum a posteriori (MAP) estimator for improving spatial resolution. The MAP estimator uses a spatially varying statistical model based on vector quantization to exploit local correlations. Kawakami et al. fused HSI with images from RGB cameras by initially applying an unmixing algorithm to the HS input and then treating the unmixing problem as a search for the input factorization. Akhtar et al. proposed a fusion algorithm of MSI and HSI using non-parametric Bayesian sparse representation. Meng et al. proposed a comprehensive relationship model related to HSI and multi-source HR observations based on the MAP framework. Palsson et al. proposed a new method for MSI and HSI fusion in a low-dimensional PC subspace; therefore, only the first few PCs have to be estimated, rather than all spectral bands. Fusion-based methods can substantially improve the spatial resolution of images through HR spatial detail injection. However, in many cases, the HR MSI corresponding to the LR HSI covering the same area and acquired at similar times is not always easy to obtain. Although HR MSI data is available, registration and preprocessing of multi-sensor data is difficult. Furthermore, this difficulty affects the accuracy and performance of the algorithm.
The spectral super-resolution (SSR) method overcomes the unavailability of HRHS images by improving the spectral resolution of MS images without auxiliary HS images, and its focus is on spectral transformation rather than spatial resolution enhancement. In 2008, Parmar et al. first reconstructed HS images from RGB images through sparse restoration. Inspired by this research, Arad and Ben-Shahar proposed to calculate the dictionary representation of each RGB pixel by using an orthogonal matching pursuit algorithm. Wu et al. greatly improved Arad's method from spatial SR by pre-training a complete dictionary as an anchor point to perform nearest neighbor search based on the A+ algorithm proposed by Timofte et al. In 2018, Akhtar and Mian modeled the natural spectrum under Gaussian processes and combined it with RGB images to recover HS images. Without dictionary learning, Nguyen et al. explored a strategy of training a radial basis function network that presents spectral transformations to recover scene reflectance using training images. Deep learning, especially convolutional neural networks (CNN), has attracted increasing attention recently and has been proven to outperform most traditional methods in areas such as segmentation, classification, denoising, and spatial SR. Inspired by the semantic segmentation architecture Tiramisu, Galliani et al. proposed DenseUnet with 56 convolutional layers to show good performance. To demonstrate that comparable performance can be achieved through shallow learning, Can et al. proposed a medium-depth residual CNN to recover the spectral information of RGB images. Shi et al. designed a deep CNN with dense blocks and a new fusion scheme to handle the case where the spectral response function (SRF) is unknown. To optimize the bands pixel-by-pixel, Gewali et al. proposed a deep residual CNN to learn optimized MS bands and transforms to reconstruct HS spectra from MS signals. Arun et al. explored a CNN-based encoding-decoding architecture to model the space-spectrum before improving recovery. However, deep learning-based models resemble data-driven black boxes with ideal feature learning and nonlinear mapping capabilities. Recently, problem-specific interpretability has been identified as an important component in the development of CNNs. Several research efforts have attempted to achieve this goal. Most of them attempt to combine deep learning with physical model-driven methods. By learning the regularization terms of variational models or MAP frameworks, CNN is used to implement some physical mappings as approximation operators and denoisers in many image processing tasks, such as denoising, compressed sensing, data fusion, and deblurring . However, these methods only utilize the pre-trained CNN prior without updating it in model-driven optimization. In addition, the training of these algorithms is divided into two stages: learning optimization and variational optimization, making it difficult to inherit the data-driven advantages of deep learning.
In this article, an end-to-end optimization-driven CNN with spectrum degradation model is constructed and different spectrum ranges are grouped according to SRF for reconstruction. SRF is used to guide CNN groups in spectrally similar bands to further enhance spectral information . Instead of running variational models and CNNs alternately, an optimization-driven CNN with deep spatial spectral priors and parameter self-learning is proposed. The proposed CNN repeatedly updates intermediate HS images in an end-to-end manner.
The contributions are as follows:
1) Combining data-driven methods with optimization algorithms , an end-to-end optimization-driven CNN is proposed to improve the interpretability of the model. A channel attention module (CAM) is introduced into this model to embed parameter self-learning that considers band spectrum differences into the CNN.
2) SRF is used as a guide to help CNN group suitable spectral bands to reconstruct HS information and learn good spectral details from the real spectral channel range in the proposed CNN.
3) The spatial-spectral convolutional layer is used to model the deep spatial-spectral prior . Furthermore, the proposed network adopts a fast spatial-spectral loss function reformulated from L1 and spectral angle mapper (SAM) losses to achieve fast convergence and good spatial-spectral constraints.

PROPOSED METHOD

First, the spectral degradation between MS and HS imaging is modeled in this section. On the basis of this model, the SSR problem is formulated and divided into two sub-problems. Finally, the proposed SSR network with joint spatial-spectral HSI prior (HSRnet) is comprehensively demonstrated by using CNN to learn physical mapping. The framework of the proposed method is shown in Figure 1. The proposed framework can be divided into two parts, including an initial recovery network and an optimization phase with attention-based parameter self-learning and spatial spectral networks (SSNs), which follow the data flow in a model-based approach.Insert image description here

Model Formulation

Let X ∈ RW × H × CR^{W×H×C}RW × H × C represents the observed HSI, where C is the number of spectral channels, W and H are the width and height respectively, and Y∈RW × H × c R^{W×H×c}RW × H × c represents the observed MSI, where c < C is the number of multispectral bands, especially for RGB images, where c = 3. Varying in SRF, the sensor obtains different MS or HS data with different frequency bands. Transformation matrix Φ∈RC × c R^{C×c}RC × c can be used to describe the spectral degradation between MS and HS imaging as follows:Insert image description here
The spectral transformation matrix is ​​closely related to SRF and can be approximately estimated by some methods, such as Hysure and RWL 1-SF. According to (1), the relationship between MSIs and HSIs is clarified. However, in SSR, obtaining a high-dimensional cube from low-dimensional data is an underdetermined problem. High-dimensional HSI can be predicted approximately by adopting some priors on the minimization problem to constrain the solution space as follows:
Insert image description here
where γ is the trade-off parameter and R(·) is the regularization function. As in (2), the minimization problem is constrained by two parts. The first term is a data fidelity term that limits the solution according to the degradation model, and the second regularization term constrains the predicted X^ with the HSI prior.
Variable splitting techniques can be used to further solve this minimization problem and separate the two terms in (2). Introducing the auxiliary variable H, reformulating (2) gives us the constrained optimization problem as follows:
Insert image description here
According to the semi-quadratic splitting method, the cost function is then transformed intoInsert image description here
where μ is the penalty parameter with various values ​​in different iterations. Using the variable splitting technique, (4) can be solved by iteratively solving two subproblems:Insert image description here
considering The approximate solution of is as follows:Insert image description here
Insert image description here
In addition, the prior includes two meanings: one is the constraints on spatial information, such as edge clarity, texture features, local smoothness, non-local self-similarity, non-Gaussianity, etc.; the other is are limitations on spectral information, such as sparseness and high correlation between spectra. Unlike total variation or sparse priors, HSI priors contain more than one attribute and should be modeled using nonlinearity to improve accuracy.
Deep learning-based methods have good nonlinear learning capabilities and are proven to be able to perform many image restoration tasks. In this paper, due to the nonlinearity of the HSI prior, SSN is proposed to achieve the optimization described in (8). By extracting spatial and spectral information, the intermediate results are updated according to the constraints of (6). Therefore, the optimization of H is rewritten asInsert image description here
Where Spa_Spec (·) represents SSN. Details will be described in Section II-B. Through a new H update method, the original optimization method of alternately updating H and X until convergence is rewritten into a unified update of X. Considering (7) and (9), the reconstruction is optimized as follows: Insert image description here
With the help of gradient descent algorithm and HSI prior, the proposed method is to update the intermediateXk , these three parts include initial recovery reliability, transformedX k X^{k}Xk andX k X^{k}XSpace-spectral prior for k . Initial recoveryΦ T Φ^{T}PhiT Y and Φ - and the parameters ε and μ are also replaced by convolutional layers, since CNN has been used to model the HSI prior as follows:Insert image description here

SRF-Guided Initial Restoration

As described in Section I, SRF can provide spectral correlation between MS and HS bands from an imaging perspective. Therefore, unlike traditional deep learning-based methods, SRF guidance is introduced as an auxiliary operation, which can achieve effective SSR performance. Auxiliary physical operations are of great help in processing image recovery in many types of research. In the proposed CNN, a new SRF-guided IRN block is proposed to group the bands by spectral radiation characteristics and use different operators to reconstruct the initial SSR results X 0 X^ {0}X0 . The SRF-guided initial recovery network is shown in Figure 2. Insert image description here
The whole block is a two-layer CNN. Furthermore, SRF is used as a guide to identify reconstructed convolutional layers for different spectral ranges individually. Details are as follows. First, the spectral gradient of the RGB/MS image is calculated to construct a data cube with dimensions W × H × (2c − 1), as shown in Figure 3. Insert image description here
Afterwards, the data cube is fed to a 3 × 3 convolutional layer to extract spectral features. These features are then fed into SRF-guided convolutional layers by grouping them by exploiting spectral correlation according to SRF. Spectral grouping is used to avoid reconstruction distortion caused by excessive spectral differences between different channels. Still, there will still be some differences between bands in the same group, which is inevitable. The proposed strategy ensures that intra-group band reconstructions are made from the same combination of multispectral channels. By roughly representing spectral correlations from imaging similarities in terms of SRF, SRF-guided convolutional layers do not have to be tuned for the same sensor, which improves the generalization of this module.
For example, in the CAVE dataset consisting of RGB images and HSI with 31 bands, the spectral range can be divided into three groups based on the band contributions in RGB imaging, including contributions to the blue band only, contributions to the blue and green bands There is a contribution as well as a contribution to the green and red bands, which has been proven to be the best by a large number of experiments. Then, the grouped spectral features are fed into the convolutional layer. Therefore, the SRF-guided convolutional layer plays the role of spectrum grouping recovery. In other words, HS channels with high spectral correlation will be constructed from the same set of convolution operators.
Guided by the SRF, the IRN block can group spectral bands with high spectral correlation. This grouping avoids the introduction of irrelevant spectrum information that interferes with spectrum recovery.

Deep Spatial–Spectral PriorInsert image description here

As discussed in Section II-A, the HSI prior can be modeled by SSN, as shown in Figure 4. SSN consists of two subnetworks connected in series: one for spatial information extraction and the other for spectral feature extraction.
The intermediate reconstructed HSI is fed to the first 3 × 3 convolutional layer to calculate additional feature maps considering the influence of spatial neighborhoods and transform the HSI data into a high-dimensional space. This transformation provides additional extracted features for subsequent learning of spectral information. The second 3 × 3 convolutional layer is used as a selection for next spectrum optimization from redundant features ; furthermore, reducing the number of feature maps can speed up network calculations. The last 1 × 1 convolutional layer implements pixel-by-pixel fine-tuning of each spectral vector. Using data-driven training, fine-tuning can be learned as a spectral optimization process. In addition, the 1 × 1 convolutional layer can significantly improve the effect of low-level image processing, which can further promote SSN learning of HSI priors. Skip connections that add the input to the output of the spatial network are also applied. This connection can speed up network calculations while forcing the network to pay closer attention to the details of the change.
Equipped with SSNs, the proposed method can implicitly introduce HS before, further constraining the solution space, and achieving improved SSR results.

Optimization Stages in HSRnet

By applying the gradient descent algorithm and the depth spatial spectrum prior, the SSR problem can be solved by updating X to (11), which is regarded as an optimization process. When optimization is unfolded, networks that include multiple stages can be used as an alternative to implementing optimization updates in a deep learning manner, as shown in the optimization stages in Figure 5.Insert image description here

The original RGB/MS image Y is first fed into the IRN block for initial estimation X 0X0 = IRN(Y). Given initial HSI recoveryX 0 X^{0}X0 , the iterative optimization can be modeled in a feed-forward manner, which can be trained to learn the HSI prior and simultaneously match the spectral degradation model. As shown in (11), the k-th update requires three parts. The first term is T(X k − 1 X^{k−1}Xk 1 ), which isX k − 1 X^{k−1}XThe spectral transformation before k 1 is calculated by a convolutional layer of size C × 3 × 3 ×C. The second term is ε·IRN(Y), which is the weighted initial estimate of εX 0 X^{0}X0 . The last one is εμ · Spa_Spec (X k − 1 X^{k−1}Xk 1 ), the εμ weighted result of Hk, which is fromX k − 1 X^{k−1}XThe results of k 1 are fed into the SSN of the HSI prior. The parameters ε and μ are learned by blocks with attention mechanism.

Attention-Based Parametric Self-Learning

The step size ε and the equilibrium parameter μ are changed accordingly in each iteration to iteratively optimize the intermediate variables X k X^{k}Xk . Thanks to backpropagation in training, all parameters in this article can be learned in a data-driven manner without human intervention. However, the parameters in traditional methods are similar for different spectral channels. This similarity may be an inappropriate approach for spectral bands with different radiative properties due to different optimal signal-to-noise ratios and different spectral information introduced in the input data. Considering the radiation differences in different frequency bands and the good performance of CAM in channel weighting, CAM blocks are applied in the proposed HSRnet, as shown in Figure 6. CAM can help HSRnet focus on bands that require urgent optimization by exploiting the relationship between channels with high weights. Insert image description here
The CAM block consists of two pooling layers with max pooling and mean pooling, two 3 × 3 convolutional layers and a sigmoid function. First, the reconstructed HSI is fed to the pooling layer to extract global weights. After the pooling layer, the global weights are forwarded to two convolutional layers and summed. Finally, the channel weights are activated by sigmoid before element-wise multiplication.
Introducing channel attention, HSRnet can easily learn different parameters as vectors for each iteration instead of fixed values. This condition can ensure that the network adaptively adjusts the weights in spectral optimization to achieve better reconstruction results.

Fast Joint Spatial–Spectral Loss

L1 loss and SAM loss functions are applied in this paper, as shown below, to improve the spectral resolution while preserving spatial details. However Insert image description here
, the application of SAM loss is difficult in practice due to the computational complexity and GPU-accelerated computation as a Vector form capabilities. Inspired by [46], the transformed RMSE loss is utilized, which is shown asInsert image description here

Guess you like

Origin blog.csdn.net/weixin_43690932/article/details/133203487