hslogic_Image Mosaic--Digital Image Mosaic Technology

Panorama (Panorama), or image mosaic (Mosaic) technology is due to the viewing angle of the camera equipment, it is impossible to take a large picture. This article focuses on the discussion of splicing synthesis technology. First, it focuses on the main purpose of splicing, then briefly introduces the splicing process and core issues, image registration and methods, pixel-level image fusion and algorithms, and finally focuses on two-dimensional wavelet fusion technology to discuss the study.

      

1. The main purpose of splicing

Image stitching technology can solve the problem that it is impossible to take a large picture at a time due to the limitation of the viewing angle and size of imaging instruments such as cameras. It uses a computer to automatically match and synthesize a wide-angle picture, so it has a wide range of uses in actual use. At the same time, the research on it has also promoted the research of algorithms related to image processing.    

It has great value in practical applications such as the collection and display of military images, especially infrared images, the registration and splicing of medical images, the simulation of three-dimensional virtual scenes, the compression of digital pictures, and the splicing of aerial and remote sensing pictures.

For example, many office scanners are mostly A4 or A3 format. If you use such a scanner to scan and input a picture of A0 or larger (such as a chart), it seems powerless. Although there are special large-format roller scanners, this kind of scanner is a special equipment, and the price is expensive, and ordinary enterprises, institutions and scientific research institutions cannot afford it. How to use the existing equipment to realize the input of huge images is an important subject we need to study and solve. Because the computer’s processing power is constantly increasing, but the price is declining, and there are more and more users. To solve this problem with software, you can use an A3 or A4 format scanner to scan the input image in blocks and save it. Computer hard disk, and then use an image stitching program to stitch the image into the original image, the application prospect is very broad. This application is also suitable for photos taken by digital cameras.

2. Image mosaic technology process and core issues

Splicing process:

 

Input image

Pretreatment

Unified coordinate transformation

Image registration

Image fusion

Panorama mosaic

 

 

 

 

 

The basic process of image stitching technology is shown in the figure. Firstly, the image to be stitched is acquired, then preprocessed (filtering, etc.), and then unified coordinate transformation is performed, that is, all image sequences are transformed into a unified coordinate system, and different transformation methods correspond to Different splicing manifolds are followed by image registration and image fusion, and finally a panoramic mosaic image is obtained. Among them, the two steps of preprocessing and unified coordinate transformation are not necessary, and can be selected according to specific conditions.

Core issue: Image stitching technology mainly includes two key links, namely image registration and image fusion.

Three, image registration

Image registration is to find the overlapping position and range of partially overlapping sequence images (also called image alignment). Take two images as an example, that is, suppose there are two rectangular areas A and B. Knowing that B contains an area A', A'and A are the same modules, find the position in B. Combined with image stitching technology, the existing image registration methods can be divided into two categories:

(1) Local alignment technology

In the application fields of traditional image stitching and virtual environment production, the correlation method and Fourier transform in the image registration technology can be applied. In the field of scene representation, there are plane projection transformations or affine transformations between images, so corresponding point mapping and motion model methods can be used. Among them, the image alignment technology of the motion model method is a method that has developed rapidly in recent years. The main research methods of local alignment technology are summarized in detail below.

    (1) Image alignment based on features

Feature method is a commonly used alignment method, which is based on image features and combined with a certain evaluation function to find the overlapping areas in two images. The typical alignment method is based on image geometric features. The geometric features are divided into low-level features, such as edges and corners, and high-level features such as object recognition and the relationship between features. The alignment algorithm based on low-level features is generally divided into three steps: first filter the image to extract the feature set, then use these feature sets to search for the approximate alignment position of the two images, and finally refine the conversion iteration. Wen [61 can be filtered by two-dimensional Gaussian blur Get some low-level feature models, such as edge models, corner models, and vertex models. Because the angle model provides more information than the coordinate points, the paper [7] proposes an image alignment algorithm based on the Fanhe angle model, and the paper [81] optimizes matching based on geometric point features and [91] uses wavelet transform to extract reserved edges ( Edge-preserving) visual model for image alignment. Image alignment based on high-level features uses relationships between low-level features or through recognized objects to achieve alignment. Article [10] uses the feature image relationship diagram to perform image alignment. Feature-based image alignment relies on feature detection and extraction. If the feature is not obvious, it may cause alignment failure.

(2) Alignment method based on frequency domain

This method takes advantage of the good properties of the Fourier transform, that is, the function translation, rotation and scaling have their symmetry in the frequency domain. For the image translation, calculating the Fourier transform of the power spectrum of the two images can obtain an impulse function, which is not zero only at the amount of translation. For rotation, it can be expressed in polar coordinates, so that the rotation of the image is transformed into the translation of the image, and then the rotation angle between the images is calculated in the same way. If there is not only a translation but also a rotation between the images, we calculate in two steps: first calculate the rotation and then calculate the translation. This method is very suitable for image registration with small translation and rotation and zooming. It has hardware support and fast algorithms, so the calculation speed is fast, and it can overcome correlated noise and frequency-dependent noise, and is suitable for images collected by multiple sensors and light source changes.

(3) Motion model method

I. The main motion model

A commonly used method at present is the motion model method, which establishes the relationship between two images through different world models and motion models. Several main motion models are listed below. Here, m; (i=1,2,.,8) is the projection transformation parameter, (x, y, l) and (x, y, w) are the corresponding points u and u between two adjacent images. Secondary coordinates. The first type is a motion model based on a flat scene. When the scene is flat, an 8-parameter perspective model is used

     (2-1)

We can sometimes simplify it to a 6-parameter affine model:

 

The second type is a motion model based on fixed-point surround shooting

 

Among them, [wx, wy, wz] is the angular velocity and f is the focal length of the camera.

The third type is based on the plane plus parallax model

The traditional 3D sports field is a parametric sports field composed of rotation, translation and scene depth. However, aligning images of these parameter fields requires knowing the internal parameters of the camera, and in addition to the above two models, other models will cause image distortion when aligning the images. The 3D model we introduced is a plane plus parallax model based on a plane parameter field proposed by Kumar: et al. They consider that the scenery is located on a real plane or a virtual plane.

Suppose two images I0(x), I1(X), the relationship between them expressed by the motion offset is:

I1(X)=I0(XU), where U=(Ux,Uy), then its parameter model is as follows:

 

 

Among them, r=H/(Pz*T1), H: the vertical distance from the point to the plane, T1: the distance from the first camera to the plane, T2: the translation vector of the two cameras, P2: depth, .f: camera focal length.

This model is difficult to be used for the alignment of arbitrary scene images, mainly because the segmentation of scenes at different depths is a difficult problem, so it is generally used for the detection of sparse moving objects on a flat scene.

2. The method of solving model parameters

The evaluation function generally uses SSD (sum of Square Diference) intensity squared difference:

 

There are mainly the following methods for solving the optimal values ​​of parameters:

The first is to use Newton-Gaussian method or Levenberg-Marquart method for iterative solution. This method is reliable and stable, with high accuracy, and the shortcomings require manual input of initial corresponding points;

The second is to use the first-order Taylor formula to approximate the motion field and use the least squares method to solve it. This method does not require fast iterations, but the accuracy is not high:

The third type, the level estimation method, is mainly for images with a few pixel deviations. It includes the following four parts:

(1) Build a pyramid of two images;

(2) Motion estimation calculation;

(3) Image Warping operation;

(4) Gradually refine the parameters from coarse to fine.

The fourth type, quasi-parametric model estimation, is mainly for the plane + parallax model. It includes the following three parts:

(1) Segment and mark out the flat area in the image;

(2) Estimate the plane parameter transformation of the two images:

(3) Use the hierarchical estimation method to estimate the disparity vector and the translation vector T<

The fifth type, solving progressively complex models, is mainly used for plane projection where the deviation of the image is greater than a few pixels

The estimation of transformation parameters includes the following three parts:

(1) Use a part of the image to estimate the translation parameter:

(2) Use the translation parameter as the initial estimate of the affine parameter, and use the larger part of the image to estimate the affine parameter;

(3) Use affine parameters as the initial estimation, and use all of the image to estimate the plane projection transformation parameters.

Combining progressively complex models with hierarchical estimation can reduce the amount of calculations. There are certain conditions for the use of this method, that is, when the camera is shooting a flat scene, it should do approximate translation and the rotation angle should be small; when the camera is shooting at a fixed point, it should approximately do Panning (around Y axis) or Pitching (around x axis). The rotation around other axes should be as small as possible.

(Including: ① feature-based image alignment; ② frequency domain-based alignment method; ③ motion model method)

(2) Global alignment technology

Global alignment technology is an integrated technology of image stitching. To form a large image Mosaics needs to transform all the images to be stitched to a reference frame in the image sequence, and the method to solve the transformation from the image far away from the reference frame to the reference frame is to connect The transformation between them, but this will lead to accumulated errors, making the final image have a larger deviation and ghosting. Global alignment technology is to solve the above problems, the existing global alignment technology includes the following four categories:

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

1. Multi-frame adjustment and alignment

 

 

 

We can also transform the image frame Ai to Aj first, and then use Pj to project it to the reference frame A1:

Pi = AijP (j <i), all such equations are combined to form a large sparse linear equation system. Since it is overdetermined, it can be solved by the least square method. This method is effective, but the workload is relatively large, and it needs to solve the exact alignment transformation of the extra two overlapping frames.

3. Multi-frame adjustment technology based on corresponding points

A set of corresponding points is extracted from the overlapping image and then projected on the reference frame. In theory, this set of corresponding points corresponds to a point on the reference frame. However, due to accumulated errors, they cannot overlap. Shum H proposes to select a point on the reference frame to minimize the distance or angle from each point as the new corresponding point, and then use a set of new points to adjust the parameter transformation, so as to reduce the cumulative error of multiple image stitching.

Fourth, gradually expand the Mosaics method

The frame-to-Mosaics method is adopted, that is, the image Mosaics is gradually expanded. The method is to select an intermediate frame on the geometric topology from the image set as the reference frame, find the frame with the largest overlap area from the intermediate frame to the periphery, and then align it into a Mosaics Mi-1. , Find a frame fi, align fi and Mi-1 to a larger Mi. This method has high alignment quality, but the disadvantage is that the poor alignment effect of a certain frame in the middle will involve the alignment effect of subsequent frames, resulting in the entire image Mosaics The quality is reduced, and secondly, it is necessary to search for the largest overlapping frame with its neighbors.

Fourth, image fusion

Image fusion is data fusion in which images are the research object. It refers to processing two or more images of the same scene at different bands or from different sensors at the same time to form a composite image to obtain more The image processing process of the target information. Image fusion is also divided into three levels, namely pixel level, feature level and decision level. Different levels of fusion indicate the extent to which the sensor data has been processed before fusion. A given data fusion system may involve the input of all three levels of data. Pixel-level fusion is a fusion directly performed on the collected raw data layer. This is the lowest level of integration. The advantage is that it can maintain as much field data as possible and provide detailed information. The limitations are: the amount of data to be processed is large, and the real-time performance is poor; the amount of data communication is large, and the anti-interference ability is poor. It can be used in the fields of multi-source image composition, image analysis and understanding. Feature-level fusion first extracts features from the original information from the sensor, and then comprehensively analyzes and processes the feature information. Decision-level fusion is a high-level fusion that can provide a basis for command and control decisions. Decision-level fusion is the final result of three-level fusion, which is directly aimed at specific decision-making goals, and the fusion result directly affects the decision-making level. Here are several common pixel-level image fusion algorithms:

①HIS transformation

HIS transform is the most commonly used method for fusing multi-source remote sensing data, which can realize the superposition of geometric information between remote sensing images with different spatial resolutions. It first converts the 3-band multi-spectral image in RGB, color space into 3 quantities in HIS space, namely hue (H), brightness (I), and saturation (S). Brightness (I) represents spatial information, and hue (H) represents spectral information. Then stretch the high spatial resolution image to make it have the same mean and variance as the brightness component (I); finally replace the brightness component (I) with the stretched high spatial resolution image, and make it the same tone ( H), Saturation (S) performs HIS inverse transformation to obtain a fusion image.

②PCA method

The idea of ​​PCA transformation, ie principal component transformation, is similar to HIS transformation. The process is as follows: firstly use 3 or more band data to obtain the correlation coefficient matrix between images, calculate the eigenvalues ​​and eigenvectors from the correlation coefficient matrix, and then obtain each Principal component image; then the high spatial resolution image data is contrast stretched so that it has the same mean and variance as the first principal component image data; finally the stretched high spatial resolution image replaces the first principal component, It is combined with other principal components through inverse PCA transformation to obtain a fused image. The premise of replacing the first principal component with a stretched high spatial resolution image is that the two are close to the same. This is because the multi-band spectral image is transformed by PCA, and the spatial information of each band is concentrated in the first principal component. The spectral information is retained in other components. The PCA fusion method is superior to the HIS fusion method in preserving the spectral characteristics of the original multispectral image, and the distortion of the spectral characteristics of the PCA method is small. In addition, unlike the HIS fusion method, the PCA fusion method can only use 3 bands of multispectral images for fusion at the same time. It can fuse 3 or more bands of multispectral images.

③HPF method

The high-pass filtering method first uses a small spatial high-pass filter to filter the high spatial resolution image, and the result of the filtering retains the high-frequency information related to the spatial information, and filters out most of the spectral information. Therefore, the result of high-pass filtering is added to each multi-spectral image data pixel by pixel. After this processing, the spatial information of the high-resolution image is fused with the spectral information of the high-spectral resolution image.

An important feature of HIS transform, PCA transform and HPF method is that while improving the spatial resolution of the fused image, it can reduce the distortion and change of the spectral information to a certain extent.

④Image pyramid

Image pyramid was originally used to describe multi-resolution image analysis and as a model of binocular fusion in human vision. A general image pyramid is an image sequence in which each image is composed of low-pass filtering and its predecessor sub-sampled samples. Due to sampling, in each level of decomposition, the size of the image is divided into two halves in two spatial directions, thus producing a multi-resolution signal representation. The image pyramid method results in a signal representation with two pyramids: the smoothed pyramid contains the averaged pixel values, and the difference pyramid contains the pixel differences, ie edges. Therefore, the difference pyramid can be regarded as a multi-resolution edge representation of the input image.

The size of the pyramid is 4/3 of the source image, increasing the amount of data; in the reconstruction of the pyramid, instability may occur, especially when there are obvious differences in the multi-source image, the fusion image may have patches.

⑤ Wavelet transform

A signal analysis method similar to the image pyramid is wavelet transform. The main difference is that the image pyramid causes the data to increase, and the wavelet decomposition is non-redundant, so that the total amount of data after the image is decomposed by the wavelet will not increase.

Wavelet transform can decompose the original image into a series of subspaces with different spatial resolution and frequency domain characteristics, and can fully reflect the local change characteristics of the original image. The basic principle of wavelet-based fusion is to merge low-resolution spectral data at a decomposition level of high-resolution spatial data. This can be achieved by substitution, addition, or selection of corresponding coefficients. Finally, the integrated fusion component generates an image that combines the spectral information of the low-resolution band with the spatial resolution of the higher panchromatic band.

Compared with traditional data fusion methods such as PCA, HIS, etc., the wavelet fusion model can not only reasonably select the wavelet base and the number of wavelet transformations according to the different characteristics of the input image, but also can introduce the two sides according to actual needs during the fusion operation. Detailed information, thus showing stronger pertinence and practicality, and better integration effect. In addition, from the evaluation of the flexibility of the implementation process, HIS transform can only and must perform fusion operations on three bands at the same time, the input image of PCA transform must have three or more than three, while the wavelet method can complete a single band or multiple bands. Fusion operation of two bands. The commonly used methods of pixel-level image fusion are mainly HIS transform, PCA transform and wavelet transform.

⑥Other image fusion technology

Image fusion methods also include evidence theory method, neural network method, expert system method, etc., which are mainly used for classification and statistics in target recognition.

 

Five, two-dimensional wavelet image fusion

In reality, objects in images appear at different scales. Take an edge as an example. It may be a steep edge from black to white, or it may be a slowly changing edge that spans a considerable distance. The multi-resolution method in image representation and analysis is based on this consideration.

Inspired by the Burt and Adelson image decomposition and reconstruction pyramid algorithm (ie Gauss-Laplace pyramid algorithm), Mallat proposed the Mallat algorithm based on the multi-resolution analysis of wavelet transform.

For the two-dimensional case, let Vj2(jεZ) be a separable multi-resolution analysis of space L2(R2). For each εZ, the scale function system constitutes the normal orthogonal basis of Vj2(jεZ), and the wavelet function εZ2 constitutes L2 (R2) normal orthogonal basis. Then for a two-dimensional image f(x,y) εVj2, it can be represented by its projection Ajf(x,y) in the Vj2 space:
 

Wavelet transform and so on are between using low-pass filter and high-pass filter to filter the image signal. If Hr, Gr and Hc, Gc denote that the mirror conjugate filters H and G act on the rows and columns, respectively, the wavelet transform can be simply expressed as follows:

 

Where H*, G* are the conjugate transpose matrix of H and G.

For a two-dimensional image, the operator Hr, Hc in equation (4) is a low-pass filter, which filters out the low-frequency component of the image, so Cj+1 shows the low-frequency component of Cj, that is, the low-frequency part of the image. Operators Hr, Gc are equivalent to a two-dimensional low-pass filter, which smooths the columns and detects the difference of the rows. Therefore, Dj1+1 shows the vertical high-frequency component of Cj, that is, the horizontal edge of the image. The operator GrHc smooths the lines, that is, the vertical edges of the image. The operator GrGc is a two-sided high-frequency filter to detect diagonal edges. It can be seen from this that performing wavelet transform on an image is to decompose it into different feature domains at different frequencies.

After the original image to be fused is subjected to wavelet transform, the image is decomposed into different frequency regions, and the image fusion processing must use different algorithms for image fusion on each frequency band. For images, layer wavelet transform can be performed, and the wavelet transform of each layer only needs to transform the low-frequency components after the wavelet transform of the previous layer. In this way, a pyramid structure of wavelet transform is formed.

Since the highest layer of image fusion needs to select or equalize data, the fusion operator used in the low-frequency part of the highest layer is the most critical step for the details of the fused image. The reasonable selection of image high-frequency operators can enhance the image Edge, highlight the role of the edge. Generally, the low-frequency part of the highest layer uses a comparison operator, the high-frequency part uses a simple weighting operator, and the other layers all use a simple weighting operator.

2 Wavelet characteristic analysis and image fusion rules

2.1 Theoretical basis of the fusion method

Through numerical distribution statistics, the sub-images of the source images A and B after wavelet decomposition have the following characteristics: (1) The data change range of the area in the original image is consistent with the data change range of the corresponding area in the sub-image; (2) For Different source images of the same target or object have the same or similar data values ​​in the corresponding regions of the low-frequency images, but the high-frequency sub-images have significant differences. The above-mentioned characteristics of wavelet transform provide a theoretical basis for the selection of effective fusion methods.

2.2 Image fusion rules and fusion factors

Two TM remote sensing images from a certain area of ​​Wuhan are used for fusion experiments. Let A and B be the two original images, and F is the fused image. In the process of image fusion, the selection of fusion rules and fusion operators is an unsolved problem in image fusion so far.

(1) Fusion rule 1: For edge components, that is, the high-frequency components LHj, HLi, HHi in wavelet decomposition, take the maximum value of the corresponding items in the corresponding coefficient matrices of the two images (i=1,2,3,... ,N); For the low frequency component LL, because this part has a great influence on the quality of the restored image, use:

F(j,k)=(A(j,k)+K×B(j,k)) ×α-|A(j,k)-K×B(j,k)| ×β calculation. Among them K, α, β are weighting factors. The first half A(j,k)+K×B(j,k) ×α means taking the weighted average of the two images, which affects the energy of the fused image and determines the height of the fused image; the second half|A (j,k)-K×B(j,k)|×β means taking the weighted difference of two images, including the blur information of the two images. The factor K adjusts the optimal ratio of the two images to balance the two images with different brightness. As the α factor increases, the image becomes brighter; as the factor β increases, the edge of the image becomes stronger. For different images, proper adjustment of K, α, β can reduce blurred edges and ensure that the edge information is not excessively lost during the reduction. For other high-frequency components, taking the maximum value of the two sets of coefficients can obtain the strongest edge information, thereby obtaining a good-quality output image.

(2) Fusion rule 2: Calculate the local average gradient of each pixel on the high-frequency sub-image after wavelet decomposition of the two images, and use the local average gradient of the pixel as a criterion to determine the pixels on the fused high-frequency sub-image value. Let A(x,y) and (x,y) be the visible light and infrared images, respectively, and the high-frequency sub-images of different resolutions are:

Ajk(x,y) and Bjk(x,y) (k=1,2,3; j is the scale parameter)

The gradient images of high-frequency sub-images of different resolutions are GAjk(x,y) and GBjk(x,y). If GAjk(x,y)≥GBjk(x,y), the high-frequency information after wavelet decomposition is :

 

Six, summary

This article only briefly introduces some basic concepts, processes, methods and algorithms of splicing. The algorithm only does a part of the verification experiment, and the algorithm pursues the stronger the robustness, the higher the accuracy, and the faster the calculation speed. With the advancement of science and technology, image splicing technology will have a better development, and its application will become wider and wider!

 

 

Take the image woman as an example, do the following processing in Matlab:

clear

load woman;% load the original image

X1=X;map1=map;

subplot(221);image(X1);colormap(map1)

title('woman');

axis square% draw a woman image

load wbarb;% load the original image

X2=X;map2=map;

for i=1:256

    for j=1:256

       if(X2(i,j)>100)

           X2(i,j)=1.2*X2(i,j);

       else

           X2(i,j)=0.5*X2(i,j);

       end

    end

end

subplot(2,2,2);image(X2);colormap(map2)

title('wbarb');

axis square% draw wbarb image

[c1,s1]=wavedec2(X1,2,'sym4');

sizec1=size(c1);

for i=1:sizec1(2)

     c1(i)=1.2*c1(i);

end

[c2,s2]=wavedec2(X2,2,'sym4');

c=c1+c2;

c=0.5*c;

xx=waverec2(c,s1,'sym4');

subplot(223);image(xx);

title('fusion image');

axis square;% draw the fused image

 

 

Result analysis:

An image is fused with the enlarged image of a certain part of him. The fused image gives a hazy and dreamlike feeling, and the darker background part is faded.

Guess you like

Origin blog.csdn.net/ccsss22/article/details/108701449