Rethinking the Image Fusion(PMGI)

1.Abstract

This article proposes a method based onGradient and intensity scale maintenance(PMGI)'s fast unified image fusion network can realize various image fusion tasks end-to-end, including infrared and visible image fusion, multi-exposure image fusion, medical image fusion, multi-focus image fusion and panchromatic enhancement. We unify the image fusion problem asTexture and intensity scale maintenance of source imagequestion. On the one hand, the network is divided intogradient pathandintensity pathPerform information extraction. We perform feature reuse in the same path to avoid information loss due to convolution. At the same time, we introduce the path transmission block to exchange information between different paths, which can not only fuse gradient information and intensity information in advance, but also enhance the information to be processed later. On the other hand, we define a unified loss function form based on these two kinds of information, which can be adapted to different fusion tasks. Experiments on publicly available datasets show that our PMGI outperforms the state-of-the-art in both visual performance and quantitative metrics in various fusion tasks. Furthermore, our method is faster than existing techniques.

2. Introduction

Image fusion aims to extract the most meaningful information from images acquired by different sensors and combine the information to generate a single image that contains richer information and is more conducive to subsequent applications. Common image fusion includes infrared and visible image fusion, multi-exposure image fusion, multi-focal image fusion, medical image fusion and remote sensing image fusion (also known as panchromatic enhancement). They are used in object detection, high-definition television, medical diagnosis and other fields (Ma, Ma and Li 2019; Ma et al. 2017; Xing et al. 2018).

Although existing image fusion methods can achieve good results in corresponding fusion tasks, there are still several aspects that need improvement. First, existing methods often require manual design of activity-level measurement and fusion rules. This becomes increasingly complex given the diversity of source images. Second, most methods are only applicable to specific fusion tasks and cannot be generalized. It is very important to design a general method based on the nature of image fusion. Third, existing fusion methods tend to be less time-competitive due to computational complexity and large number of parameters.

To address these challenges, we propose a fast unified image fusion network based on Gradient and Intensity Scale Maintenance (PMGI), which can efficiently implement various types of image fusion tasks end-to-end. First of all, PMGI is an end-to-end model, where the source image is the input and the fused image is the output, without any manual intervention in between. Second, we transform the fusion problem into the maintenance of gradient and intensity information. Intensity information enables the fused image to have a similar histogram distribution to the source image , while gradient information provides a more refined Texture detail. Therefore, we define a unified form of loss function for multi-image fusion tasks. In order to adapt the network to different image fusion tasks, we can select more effective and interesting information to retain in the fusion result by adjusting the weight of each loss term. Finally, we divide the network into gradient paths and intensity paths, respectively Extract corresponding information from the source image. To minimize the information loss caused by convolution, the features of each layer in the same extraction path are reused. We also introduce the path passing module between two paths. On the one hand, it can pre-fuse gradient and intensity information. On the other hand, it can enhance the information for subsequent processing. It is worth noting that due to the use of 1×1 convolution kernel and controlling the number of feature channels, the number of parameters in our network is limited to a certain range. Therefore, our method can achieve fusion at a high speed.

Our work contributions include the following three aspects:

  • We propose a new end-to-end image fusion network that can uniformly implement various image fusion tasks. The proposed PMGI can well fuse infrared and visible images, multi-exposure images, medical images, multi-focal images and remote sensing images.

  • We design a specific loss function that is suitable for almost all image fusion tasks and can achieve the desired results by adjusting the weight of each loss term.

  • Our method can perform image fusion with higher efficiency in multiple fusion tasks. Code is available at: https://github.com/HaoZhang1018/PMGI ↗ AAAI2020.

3.Method

The essence of image fusion is to combine the most important information in the source images to generate a single image with richer information and better visual effects. In different image fusion tasks, the properties of the source images are very different, so the same processing method is not suitable. However, in most cases, there is an underlying correlation between the two types of source images because they both describe the same scene and the source images contain complementary information. Therefore, we try to solve different kinds of fusion tasks in a unified way through reasonable network architecture and loss function design.

Since the most basic element of an image is a pixel, the intensity of the pixel can represent the histogram distribution of the image, and the difference between pixels constitutes a gradient, which can represent the texture details of the image. Therefore, we describe the entire image from two aspects of information: gradient and pixel intensity. This is reflected in the network architecture and loss function.

We divide the network into two information extraction paths: gradient path and intensity path.For the gradient path, it is responsible for extracting texture information, that is, high-frequency features. Similarly,For intensity paths, it is responsible for extracting intensity information. Since gradient information and intensity information need to be extracted and preserved from both types of source images simultaneously, the input to each information extraction path consists of different source images connected along the channel dimension to preserve potential correlations. We set the connection ratio of these two source images to β. Additionally, we perform feature reuse and information exchange operations.

First, the loss of information during the convolution process is inevitable. Feature reuse can reduce information loss and increase feature utilization to a certain extent. The exchange between different types of information can pre-fuse gradient and intensity information and is also an enhancement of the information before the next extraction.

In addition to the above-mentioned general network structure, we also designed a loss function with a unified form based on the properties of the image. We transform the image fusion problem into a scale maintenance problem of gradient and pixel intensity information. Our loss function consists of two types of loss terms: gradient loss and intensity loss. They are both built for both source images. Respectively, intensity constraints can provide a coarse pixel distribution, while gradient constraints can enhance texture details. Their joint constraints can achieve reasonable pixel intensity distribution and rich texture details. Since the fused image cannot preserve all the information of the source image, we must make a trade-off between intensity distribution and texture details to preserve the more important gradient and intensity information. Therefore, we can adjust the weight of each loss term to change the proportion of various types of information to adapt it to different image fusion tasks.

3.1 Network Architecture

The proposed PMGI is a very fast convolutional neural network. As shown in Figure 1, we divide the network into gradient paths and intensity paths for corresponding information extraction. Gradient and intensity information are communicated through path transmission blocks. It is worth noting that after several attempts, the connection ratio β of the two source images in the input was determined to be 1:2.

In both paths, we use four convolutional layers for feature extraction. Referring to the idea of ​​DenseNet, dense connections are performed in the same path to achieve feature reuse. Furthermore, path transfer blocks are used to transfer information between these two paths, so the inputs to the third and fourth convolutional layers not only depend on the outputs of all previous convolutional layers, but also on the convolutional layer in the other path Output. The first layer uses a 5×5 convolution kernel, and the last three layers use a 3×3 convolution kernel, combined with batch normalization and Leaky ReLU activation function.path transport blockThe structure of is also shown in the lower right corner of Figure 1. It uses a 1×1 convolution kernel, combined with batch normalization and Leaky ReLU activation function.

Then, we use the strategy of concatenation and convolution to fuse the features extracted from the two paths. We concatenate the two feature maps along the channel. It is worth noting that the idea of ​​feature reuse is still used here. The eight feature maps involved in the connection come from a total of eight convolutional layers from both paths. The convolution kernel size of the last convolutional layer is 1×1, and the activation function is tanh. In all convolutional layers, padding is set to SAME and stride is set to 1. Therefore, none of these convolutional layers changes the size of the feature map.

3.2 Loss function

The loss function determines the type of information extracted and the proportional relationship between different types of information. The loss function of our network consists of two types of loss terms, namelystrength lossandgradient loss. The intensity loss is used to constrain the fused image to maintain a similar intensity distribution as the source image, while the gradient loss forces the fused image to contain rich texture details. Note that we construct these two types of loss terms for each source image. Therefore, the loss function contains four terms and is expressed as:

L P M G I = λ A i n t L A i n t + λ A g r a d L A g r a d + λ B i n t L B i n t + λ B g r a d L B g r a d ( 1 ) L_{PMGI} = \lambda_{Aint} L_{Aint} + \lambda_{Agrad} L_{Agrad} + \lambda_{Bint} L_{Bint} + \lambda_{Bgrad} L_{Bgrad} \qquad (1) LPMGI=lAintLAint+lAgradLAgrad+lBintLBint+lBgradLBgrad(1)

Among A A Asum B B B are two source images respectively, L i n t L_{int} LintRepresents the intensity loss term of a source image, L g r a d L_{grad} Lgrad represents the corresponding gradient constraint term, λ \lambda λ is the weight of each loss term.

Strength loss is defined as:

L A i n t = 1 H W ∥ I f u s e d − I A ∥ 2 2 , L B i n t = 1 H W ∥ I f u s e d − I B ∥ 2 2 ( 2 ) L_{Aint} = \frac{1}{HW} \left\lVert I_{fused} - I_A \right\rVert_2^2, \qquad L_{Bint} = \frac{1}{HW} \left\lVert I_{fused} - I_B \right\rVert_2^2 \qquad (2) LAint=HW1IfusedIA22,LBint=HW1IfusedIB22(2)

其中 I f u s e d I_{fused} Ifused is the fused image generated by PMGI, I A I_A IAsum I B I_B IB are two source images, H H Hsum W W W are the height and width of the image respectively.

same place, use ∇ \nabla represents the gradient operator, and the gradient loss is defined as follows:

L A g r a d = 1 H W ∥ ∇ I f u s e d − ∇ I A ∥ 2 2 , L B g r a d = 1 H W ∥ ∇ I f u s e d − ∇ I B ∥ 2 2 ( 3 ) L_{Agrad} = \frac{1}{HW} \left\lVert \nabla I_{fused} - \nabla I_A \right\rVert_2^2, \qquad L_{Bgrad} = \frac{1}{HW} \left\lVert \nabla I_{fused} - \nabla I_B \right\rVert_2^2 \qquad (3) LAgrad=HW1IfusedIA22,LBgrad=HW1IfusedIB22(3)

It should be noted that λ \lambda in formula (1)λ can be adjusted to change the proportion of different types of information in the fused image to suit different tasks. The parameter setting rules corresponding to specific tasks are described below.

For infrared and visible light image fusion, we hope that the gradient information of the visible light image and the intensity information of the infrared image are mainly retained in the fusion result, while the intensity information of the visible light image and the gradient information of the infrared image are secondary. Therefore, parameter λ \lambda λ should meet the following setting rules:

λ i r i n t > λ v i s i n t , λ i r g r a d < λ v i s g r a d ( 4 ) \lambda_{irint} > \lambda_{visint}, \qquad \lambda_{irgrad} < \lambda_{visgrad} \qquad (4) lirint>lvisint,lirgrad<lvisgrad(4)

For multi-exposure image fusion, both overexposed and underexposed images contain equal texture details, but their intensity is either too strong or too weak. Therefore, we set the same weights to balance them to obtain appropriate intensity and rich texture details, which can be formalized as:

λ o v e r i n t = λ u n d e r i n t , λ o v e r g r a d = λ u n d e r g r a d ( 5 ) \lambda_{overint} = \lambda_{underint}, \qquad \lambda_{overgrad} = \lambda_{undergrad} \qquad (5) loverint=lunderint,lovergr ad=lunderg rad(5)

For multi-focus image fusion, both types of information (gradient and intensity) of the two source images are equally important. This is because we want to preserve the intensity and texture information of both source images simultaneously, while the in-focus (sharp) areas in the other source image complement the out-of-focus (blurred) areas. Therefore, it is also necessary to set the corresponding parameters to be consistent:

λ focus 1 int = λ focus 2 int , λ focus 1 degree = λ focus 2 degree ( 6 ) \lambda_{focus1int} = \lambda_{focus2int}, \qquad \lambda_{focus1grad} = \lambda_{focus2grad} \qquad ( 6)lfocus1int=lfocus2int,lfocus1grad=lfocus2grad(6)

Similarly, for medical image fusion, structural medical images reflect the texture information of organs, while functional medical images represent functional information, such as metabolic intensity. We take MRI and PET images as examples of structural images and functional images, and obtain main texture information from MRI images and main intensity information from PET images. However, considering that the pixel intensity of the I component of PET images is much higher than that of MRI, if the pixel intensity of PET images is mainly constrained, the excessive intensity of the fused image will mask the texture. Therefore, to balance texture and intensity, we subject the pixel intensities of PET and MRI to the same constraints. Therefore, λ \lambda λ 合设计:

λ P E T i n t = λ MRI int, λ PET grade < λ M R I gr a d ( 7 ) \lambda_{PETint} = \lambda_{MRIint}, \quad \lambda_{PETint} < \lambda_{MRIdegree}\quad(7)lPETint=lMRIint,lPETgrad<lMRIgra d(7)

Finally, for panchromatic enhancement, panchromatic images have high spatial resolution (rich texture details), while multispectral images contain rich color information. The goal is to improve clarity while keeping the spectrum undistorted. Therefore, we only constrain the texture information of the panchromatic image without constraining the intensity to avoid spectral distortion, which can be formalized as:

λ P A N i n t = 0 , λ P A N grade > λ M S degree ( 8 ) \lambda_{PANint} = 0, \quad \lambda_{PANdegree} > \lambda_{MSgrad}\quad(8)lPANint=0,lPANgrad>lMSgrad(8)

Guess you like

Origin blog.csdn.net/m0_47005029/article/details/131982941