Several Conventional Image Fusion Methods and Their Principles

At present, according to the level of image fusion, image fusion algorithms are divided into pixel-level image fusion, feature-level image fusion and decision-level image fusion. Pixel-level image fusion processing is mainly to operate and process image data at the image pixel level, which belongs to the basic level of image fusion. The advantage is that more original data of the source image can be kept, and compared with other fusion levels, the details are richer, and the target spatial position is relatively more accurate. However, strict image preprocessing such as point-to-point image correction, noise reduction, and registration must be performed on the fusion source image before fusion, otherwise the subsequent fusion effect will be seriously affected. It mainly includes principal component analysis (PCA), pulse coupled neural network (PCNN) and other algorithms. Feature-level image fusion belongs to intermediate-level fusion. This type of method extracts the advantageous feature information of each image, such as edges and textures, based on the existing imaging characteristics of each sensor. It mainly includes algorithms such as fuzzy clustering and support vector clustering. Decision-level fusion belongs to the highest level of fusion. Compared with feature-level fusion, it processes the source image after extracting the target features of the image, then proceeds with feature recognition, decision classification, etc., and then combines the decision information of each source image. Carry out joint reasoning and get the reasoning result. It mainly includes support vector machine, neural network and other algorithms. Decision-level fusion is an advanced image fusion technology. At the same time, it requires relatively high data quality and the complexity of the algorithm is extremely high.

1.1 Based on conventional image fusion methods

1.1.1 Image fusion method based on maximum (Max)/minimum value (Min)

Assuming that two images A and B to be fused have the same size and are both M×N, the fusion image F of the large image selected based on the gray value of the pixel can be expressed as
insert image description here

The image fusion method based on pixel gray value selection can be expressed as:
insert image description here

That is, during the fusion process, compare the size of the gray value of the pixel at the corresponding position in the source image A (i, j) and B (i, j), and select the pixel with a large (or small) gray value (possibly from the image A or B) as the pixel at the fused image F(i,j). This fusion method simply selects pixels with large or small gray values ​​in the source images participating in the fusion as the fused pixels, and the applicable occasions of this fusion method are very limited.

1.1.2 Image fusion method based on pixel weighted average (Average)

The pixel weighted average method is one of the simplest methods in image fusion. According to the gray information of the two images themselves, weights are assigned to the gray values ​​of the two images at the same pixel point. The gray value of the fused image is the two images Weighted sum of grayscale values. If it is a color image, repeat the above operation on the three channels to obtain the fusion grayscale on the three channels. Assuming that two images A and B to be fused have the same size and are both M×N, then the fused image F can be expressed as
insert image description here

where is the weight of image A and B, and satisfies. The image fusion method based on pixel weighted average is relatively simple, and the operation speed is fast, but only the gray scale of the pixel is considered in the fusion process, and the position of the pixel and other factors are ignored. Therefore, the generated fusion image cannot well retain the original image details, lose useful information, and increase redundant information. The visual effect is poor and the image is difficult to distinguish.

1.2 Multi-scale based image fusion method

The image pyramid [59] representation method is a multi-scale, multi-resolution representation method, and the image pyramid representation method can be imagined as a stack of images at different scales. The pyramid decomposition of an image can be used to analyze the features of various scales in the image [30]. For example, low-resolution images can be used to analyze large-scale objects, and small-scale information such as edge details can be analyzed with high-resolution images. images for analysis. According to the principle of pyramid construction, methods based on pyramid transformation can be divided into Gaussian pyramid, Laplacian pyramid, contrast pyramid, etc. [40]. These transformation methods are all based on Gaussian pyramid. type image sequence, each level of image in this sequence is obtained by low-pass filtering the previous level of image and then down-sampling every other row and column, so its unilateral size is halved layer by layer, each Each layer is one-fourth the size of the previous layer. Other pyramid transformations are obtained by further operations on the basis of the Gaussian pyramid decomposition results. Although the method based on the pyramid transformation is simple, there are up/down sampling operations in the transformation process, so it does not have translation invariance, and all Image pyramid transformation is a redundant decomposition of images, that is, in the decomposition results, the data between adjacent scales are correlated and redundant, which is likely to cause block artifacts in the fusion results.

1.2.1 Laplacian Pyramid

1. Laplacian Pyramid Image Decomposition
The source image is down-sampled every other row and every column, and then the Gaussian window function is used to perform convolution operation with the obtained image. Repeat the above operation repeatedly to form a series of gradually decreasing resolutions. And the tower layers after low-pass filtering, the pyramid formed by these tower layers is Gaussian Pyramid (Gaussian Pyramid, GP) [18].
During the operation of the Gaussian pyramid, some high-frequency detail information will be lost after the image undergoes convolution and downsampling operations. In order to describe these high-frequency information, people define the Laplacian Pyramid (Laplacian Pyramid, LP). Subtract the predicted image after upsampling and Gaussian convolution from each layer image of the Gaussian pyramid, and obtain a series of difference images that are LP decomposition images. Figure 1 below shows the construction process of the Laplacian pyramid decomposition image.
insert image description here
Fig. 1 Decomposition process of Laplacian pyramid image
Reconstruction of the upper unsampled image from the bottom image of the pyramid, that is, the prediction residual, restores the image to the greatest extent. What it shows is a set of image sequences, denoted as L. Right now:
insert image description here

Among them is the Gaussian pyramid; the Laplace pyramid. Repeat the above process to get a series of images, this step is the construction process of Laplacian pyramid.
2. Laplacian Pyramid Image Fusion The fused
Laplacian Pyramid image can be obtained by fusing the obtained Laplacian Pyramid of each image and images of corresponding levels. The specific fusion rules are as follows: Take large, small, weighted average, etc., the fusion schematic diagram is shown in Figure 2
insert image description here

Figure 2 Laplacian Pyramid Fusion Schematic
3. Laplacian Pyramid Image Reconstruction
For the fused Laplacian Pyramid, start from the top layer and recursively follow the formula from top to bottom to restore its corresponding Gaussian pyramid, and finally the original image G0 can be obtained. It is the method of using interpolation from the highest level.
insert image description here

Figure 3 shows the fusion process of a set of aligned Laplacian pyramid images of infrared images and visible light images. Figure 3(c) and (d) are the three-layer Laplacian of the original infrared image and the original visible light image respectively. For the decomposition results of the pyramid, the maximum value fusion method is used for the corresponding layers of the LP decomposition of IR and VI, and the images of each layer of the LP pyramid after fusion are obtained. As shown in Figure 3(e), the fusion image R of visible light and infrared images can be obtained by reconstructing the fused Laplacian pyramid.
insert image description here

(a) IR source image
insert image description here
(c) Visible light image insert image description here
(c) Reconstructed fusion image
Figure 3 Laplacian pyramid image fusion example
3.2.2 Contrast pyramid image fusion method
As described in the image contrast pyramid in Section 2.1.3, Based on the contrast pyramid image fusion method [19], the corresponding fusion rules are added on the basis, similar to the Laplacian pyramid, that is, by fusing the contrast pyramid of each image obtained and the images of the corresponding levels, you can obtain For the fused contrast pyramid image, the specific fusion rules include taking large, small, weighted average, etc. The flow of the fusion algorithm is shown in Figure 4 below.
insert image description here

Figure 4 The fusion process of the contrast pyramid

1.3 Image Fusion Method Based on Transform Domain

The image fusion method based on the transform domain usually also includes the idea of ​​multi-scale fusion. Generally, such methods mainly include the following three steps, as shown in Figure 1. First, the infrared source image and visible source image are decomposed into low-frequency sub-bands and high-frequency sub-bands; then, various strategies are used to process the low-frequency sub-bands and high-frequency sub-bands, and then the fusion rules are respectively designed for the low-frequency sub-bands and high-frequency sub-bands. Bands are fused; finally, the inverse transform is used to obtain the final fused image. Different image fusion methods use different multi-scale transforms, such as wavelet transform, and NSST, etc.
insert image description here

Fig.1 Heterogeneous image fusion process based on image decomposition

1.3.1 Image fusion method based on wavelet transform

In recent years, among image multi-scale decomposition and fusion methods, wavelet transform has been widely used in image fusion because of its multi-resolution and time-frequency localization characteristics. The first step of wavelet transform fusion is to construct wavelet basis function, which is a waveform with finite length and zero mean value. During image processing, different wavelets can be set according to different signals. It is well known that Fourier transform mainly deals with periodic signals, while wavelet transform is more suitable for dealing with non-periodic and abrupt signals. Applying wavelet signal to image fusion requires less calculation time and faster fusion than Fourier signal.
The wavelet basis function satisfies the formula (2-6) equation:
insert image description here
it can be stretched or translated to make it constitute an orthonormal basis on . The wavelet function has two parameters a and b, these two parameters control the scaling and translation of the wavelet, the continuous wavelet transform can be regarded as the inner product of the function and the wavelet base:
insert image description here

The wavelet-based fusion step is to decompose the image by wavelet transform after image registration, and then adopt different processing methods for images of different levels and directions, and then perform image inverse transformation to obtain the fused image. The decomposition process utilizes equation (2-7) to inner product the image with the wavelet function. Each wavelet decomposition can decompose the image into four sub-images, one low-frequency image and three high-frequency images. Because different processing methods are used for different frequencies of the image, wavelet fusion performs well in eliminating image noise and has a good noise reduction function.

1.3.2 Image fusion method based on NSST image

The shear wave (Shearlet) transformation proposed by K.Guo and G.Easley has a simple mathematical structure. It is a basis function formed by a function performing a series of operations such as translation, stretching and rotation. Shear functions generated by radial transformations such as scaling, translation, and shearing. Shearlet transforms are constructed through dilated affine systems. The Shearlet transform can achieve the best approximation to the Gowell signal, and can detect all the singular points of the two-dimensional signal. Shearlet transform still has excellent characteristics such as multi-resolution, multi-direction and locality.

  1. Shearlet transform theory
    Shearlet transform is a multi-scale geometric analysis tool, which overcomes the shortcoming that wavelet transform cannot achieve optimal approximation. On the basis of wavelet theory, affine system is used to construct shear wave. When the dimension n = 2, its expression is as follows:

insert image description here

Among them, represents the associated scale transformation, and represents the associated geometric transformation, both of which are 2×2 reversible matrices. If for any, there is a tight frame of Parseval type satisfying the following formula, it is a synthetic wavelet.
insert image description here

then
insert image description here

;
where a is the scale variable, s is the direction variable, usually take a = 4, s = 1, the anti-aircraft system is the shear wave transformation determined by the matrices A and B.
forinsert image description here
insert image description here

Among them insert image description here
, you can get
insert image description here
insert image description here

And, forinsert image description here
insert image description here

Among them insert image description hereinsert image description here, one of the parts formed. According to the support conditions of , the frequency-domain support set of insert image description here
is available , and its structure is shown in Figure 1.insert image description here

insert image description here

insert image description here
(a) Shearlet frequency domain segmentation diagram
insert image description here
(b) Frequency domain support diagram
Figure 1 Shearlet frequency domain support diagram and frequency domain support set size

Since the support set in insert image description hereget set:
insert image description here

That is, insert image description here
a Parseval framework on . Similarly, a Parseval framework
can be constructed , where D1 represents a vertical cone:insert image description here

insert image description here

Assuming insert image description hereinsert image description here
, and let insert image description here
the sum of them insert image description herehave been set before, then the set can be obtained
insert image description here

The above formula insert image description hereis a Parseval framework
insert image description here
where insert image description hereis the characteristic function of D.

The mathematical characteristics of Shearlet: (1) Shearlet transform has good local characteristics, both in space domain and in frequency domain; (2) Shearlet transform has multi-scale; (3) Shearlet transform has good directionality , the number of Shearlet transformation directions has a multiple relationship with the scale; (4) The best sparse representation of Shearlet transformation.
The process of Shearlet transformation is a reversible transformation, which is divided into two steps of decomposition and reconstruction. First, multi-scale decomposition is carried out by wavelet synthesis theory, so that the original image can be decomposed into a low-frequency sub-band and several high-frequency sub-bands; then, window function is used to decompose the direction of high-frequency sub-bands. Finally, the processed high-frequency and low-frequency images are reconstructed using the inverse Shearlet transform.
2. Non-subsampling Shearlet transform
Shearlet transform is a transform domain method proposed late in the field of image processing. Through the mathematical characteristics of the above-mentioned Shearlet transformation, the best sparse representation ability can be achieved, and the operation efficiency is high. Its multi-direction even reaches an unlimited number of directions. However, the Shearlet transform does not have translation invariance, so the non-subsampled Shearlet transform (Non-subsampled Sheartlet Transform, NSST) was proposed by researchers.
The NSST transform is a new transform domain method obtained by removing the downsampling process on the Shearlet transform. It has the advantages of the Shearlet transform and also adds the advantage of translation invariance. This method handles images that are particularly prominent in parts such as the edges and texture information of the image. The non-subsampling Shearlet transform is divided into two steps: multi-scale decomposition and orientation localization. The two parts are described separately:
(1) Multi-scale decomposition of image. Decompose the original image using a non-subsampling pyramid filter (NSP), and obtain a low-pass sub-band image and a band-pass sub-band image after processing;
insert image description here

Fig. 1 Structure of NSP non-subsampling filter bank
(2) Direction localization of image. Direction localization is carried out on the acquired bandpass subband image, and the method used is an improved shearlet filter (Shearlet filter, SF). The standard shearlet filter contains a downsampling operation, and the improved shearlet filter removes the downsampling operation to obtain translation invariance. The specific implementation process is as follows: 1. Map the standard SF from the pseudo-polarized network system to the Cartesian coordinate system; 2. Construct the "Meyer" window function to decompose the image at multiple scales to obtain different directional subband coefficients; 3. The sub-band coefficients in each direction are subjected to convolution operation, and the NSST coefficients are finally obtained.
insert image description here
Figure 2 NSST image decomposition

MATLAB code link: MATLAB code for NSCT

Other codes will be added later

Guess you like

Origin blog.csdn.net/G_redsky/article/details/125322614