SAR image change detection method (SAR_CD_DDNet (unsupervised change detection)) paper reading

1. Paper information

1. Title of the paper: Change Detection in Synthetic Aperture Radar Images Using a Dual-Domain Network

2. Code link: https://github.com/summitgao/SAR_CD_DDNet

2. Summary

Change detection in synthetic aperture radar (SAR) images is a critical and challenging task. Existing methods mainly focus on feature extraction in the spatial domain , and pay less attention to feature extraction in the frequency domain. In addition, in the analysis of plaque features, the edge region may introduce some noise features. To address the above two challenges, we propose a dual-domain network. Specifically, we consider the features of the discrete cosine transform domain and integrate the reshaped DCT coefficients into the proposed model as a frequency domain branch. Speckle noise mitigation using feature representations in frequency and space domains . Furthermore, we further propose a multi-region convolutional module , which emphasizes the central region of each patch. Adaptive modeling of contextual information and central region features. Experimental results on three SAR datasets verify the effectiveness of the model.

3. Introduction

In recent years, the rapid development of synthetic aperture radar (SAR) sensors has prompted many researchers to work on SAR image segmentation, texture analysis, and change detection [1]-[3]. Change detection is an important task in SAR image interpretation, and its main purpose is to identify the changing regions between multi-temporal SAR images, which has attracted more and more attention in the remote sensing community. Change detection using SAR images is more challenging than optical images due to the presence of speckle noise [4]. Some pioneering efforts have been made to address the problem of noise in multitemporal image analysis [5]. Traditionally, mainstream methods usually compare multi-temporal images to generate difference images (DI) and analyze DI to obtain change maps [6]. Although these methods can capture some pixel-level change information, it is difficult for them to adaptively exploit the rich feature representations in raw data .
Deep learning based methods have achieved great success by exploiting deep feature representations. However, building a robust SAR change detection model is not easy due to the following two challenges: 1) Mutual enhancement of spatial and frequency features. Existing models are mainly based on feature extraction in the spatial domain and pay little attention to the frequency domain. Recent studies have shown that compressed representations in the frequency domain are able to suppress noise in the spatial domain and enrich patterns for image understanding [11] [12]. Therefore, strengthening spatial and frequency features within a unified framework should be considered. 2) Enhance the features of the central region. Background information is crucial to the performance of change detection in SAR images, so existing methods usually use block features for classification. However, some noise features may be introduced in the edge region of each patch. Therefore, how to highlight the central region of each patch while preserving the background information is a serious challenge.
To address the above two issues, we propose a dual-domain network, DDNet for short, which jointly exploits spatial and frequency features for the SAR change detection task. Specifically, weConsider first the characteristics of the discrete cosine transform (DCT) domain. The reconstructed DCT coefficients are integrated into a CNN model as an inference branch to mitigate speckle noise using feature representations in frequency and space domains. Furthermore, we further propose a multi-region convolution (MRC) module, which emphasizes the central region of each patch. Intuitively, adaptive modeling of contextual information and central region features is effective.

4. Method

Given two SAR images I1 and I2 captured at different times in the same geographic area, the goal is to generate a binary change map where changed pixels are marked as "1" and unchanged pixels are marked as "0".
The proposed model works in an unsupervised manner. Pre-classification is the first step. The main purpose of this step is to find samples that are highly likely to be changed or unchanged. The log ratio operator [13] is first used to generate difference images (DI) . Then, hierarchical FCM clustering [14] is implemented to classify DI into three clusters: ωc, ωu and ωI . Pixels belonging to ωc and ωI are reliable pixels with a high probability of being changed or not being changed, respectively. Pixels in ωI are uncertain and need further classification. Randomly select 10% of the image blocks centered on the pixels in: ωc, ωu as the training samples of DDNet. It should be noted that the number of positive and negative samples is equal. For a given pixel, image patches centered at that pixel are extracted from I1 and I2, respectively. Each patch is of size r×r (r = 7 in this work). These two image patches are combined to form a new image patch with a size of 2×r×r. The generated new image patches are fed into DDNet for training. After training, the network will classify image patches centered on ωI pixels. The whole process is unsupervised.
insert image description here

Overview of Dual Domain Networking (DDNet). The network consists of two branches: a spatial-domain branch for capturing multi-regional features, and a frequency-domain branch for encoding DCT coefficients. In the spatial domain branch, the network contains four MRC modules, which are able to emphasize central region features while preserving contextual information. In the frequency-domain branch, the input image block is transformed to the frequency domain by DCT, and an "on-off switch" is used to select key components of the DCT coefficients.

A. Spatial Domain Feature Extraction

In the spatial domain, the network contains 4 multi-region convolution (MRC) modules, as shown in Figure 1. The details of the MRC module are shown in Figure 2. Since contextual information is crucial for change detection in SAR images, existing methods usually adopt windows of fixed size (3×3, 5×5, 7×7, etc.) to determine whether the position has changed. We argue that if we discard some edge regions in feature extraction, the central region can be emphasized and the noise in the edge regions can be potentially eliminated. To this end, we propose to extract multi-region features to enhance feature representation in SAR change detection.
In the spatial domain, the network contains 4 multi-region convolution (MRC) modules, as shown in Fig. 1. The details of the MRC module are shown in Figure 2. Since contextual information is crucial for change detection in SAR images, existing methods usually adopt windows of fixed size (3×3, 5×5, 7×7, etc.) to determine whether the position has changed. We argue that if we discard some edge regions in feature extraction, the central region can be emphasized and the noise in the edge regions can be potentially eliminated.
insert image description here
To this end, we propose to extract multi-region features to enhance the representation of MRC modules in SAR change detection . Take the patch size r = 7 as an example. The input feature maps are convolved into 15 channels and then divided into three groups Fg, Fh and Fv. After the 3×3 convolutional layer, three sets of features Fg′, Fh′ and Fv′ can be obtained, which are fused with element-wise summation to form output features.

Illustration of the MRC module. Take the patch size r = 7 as an example. The input feature maps are convolved into 15 channels and then divided into three groups Fg, Fh and Fv. After the 3×3 convolutional layer, three sets of features F0g, F0h and f0v can be obtained. These features are fused with element-wise summation to form output features.

Given an image patch A ∈ R (2×r×r), it is fed into a 1x1 convolutional layer to generate a new feature map F ∈ R (C×r×r). Then, F is divided into three groups Fg, Fh, and Fv according to the channel dimension. Therefore, the shapes of Fg, Fh and Fv are C/3×r×r, respectively. Fg denotes global region features. Fh represents the horizontal middle region feature, where the top and bottom rows are set to 0. Fh, represents the vertical middle region feature, where the left and right columns are set to 0.
insert image description here
Represents a spatially fused feature. C is set to 15 in our implementation, so we obtain the final spatially fused feature map with a size of 5 x 7 x 7. The features are then reshaped into a vector v with length 5x7x7245. Therefore, V has a global contextual view, and the central region information is enhanced.

B. Frequency Domain Feature Extraction

An input image patch of size 2×r×r is resized to 2×8×8 by bilinear interpolation. Then, the image block is transformed into the frequency domain by DCT. After that, the length of the obtained DCT coefficient vector v is 2 × 64 = 128. It has been proven that DCT proves to be very effective for noise suppression in the spatial domain. In order to further select the key components of the DCT coefficients, we employ an “on-off switch”, which generates an information vector I and an attention gate g through two linear transformations, as shown in Figure 1. The information vector I and the attention gate g are generated as:
insert image description here
where wi and Wg are the weight matrices, bi and bg are the biases in the linear transformation, and σ denotes the sigmoid activation function

Then, DCTB applies an attention gate to the information vector using element-wise multiplication and obtains the final frequency feature vector Vf.
insert image description here

C. Final changes to map generation

After obtaining the spatial domain features Vs and frequency domain features Vf, they are concatenated and fed into a fully connected (FC) layer. Then, the likelihood of being changed or not is computed by a softmax layer to generate an output. After training, the network will classify the pixels in ωI to get the final change map.
We measure the performance of the proposed DDNet with five common change detection evaluation metrics , including false positives (FP), false negatives (FN), overall error (OE), percent correct classification (PCC) and kappa coefficient (KC) .
insert image description here

Guess you like

Origin blog.csdn.net/qq_41627642/article/details/128546798