5. Stereo matching and disparity map calculation


1 Introduction

  The purpose of stereo matching is to find the corresponding point for each pixel in the left image in the right image, so that the disparity can be calculated: disparity = xi − xj disparity = x_i - x_jdisparity=xixj x i x_i xiand xj x_jxjrepresent the column coordinates of two corresponding points in the image respectively).

  The calculation process of most stereo matching algorithms can be divided into the following stages: matching cost calculation, cost aggregation, disparity calculation, and disparity optimization.
insert image description here

Stereo matching is a difficult part of stereo vision. The main difficulties are:

  • There may be repetitive textures and weak textures in the image, these areas are difficult to match correctly;
  • Due to the different shooting positions of the left and right cameras, there must be an occlusion area in the image. In the occlusion area, some pixels in the left image have no corresponding points in the right image, and vice versa;
  • The lighting conditions received by the left and right cameras are different;
  • Difficult to match overexposed areas;
  • Inclined surfaces, curved surfaces, non-Lambertian surfaces;
  • Higher image noise, etc.

  There are two main methods of stereo matching, global method and local method.

  • Local methods mainly include: BM, SGM, ELAS, Patch Match, etc.
  • Global methods mainly include: Dynamic Programming, Graph Cut, Belief Propagation, etc.

  The local method has a small amount of calculation, and the matching quality is relatively low. The global method omits the cost aggregation and adopts the method of optimizing the energy function. The matching quality is high, but the calculation amount is also relatively large. Because the global method is too slow, it is basically a local method in practical applications.
  Currently, the methods implemented in OpenCV include BM, binaryBM, SGBM, binarySGBM, BM(cuda), Bellief Propogation(cuda), and Constant Space Bellief Propogation(cuda). The SGBM algorithm is easier to use. Its core is based on the SGM algorithm, but it is different from the SGM algorithm. For example, the matching cost part uses BT cost (original image + gradient image) instead of HMI cost and so on.

2. Matching cost calculation

  The main purpose of the matching cost calculation is to calculate the correlation between the left and right binocular image pixels, so as to match the left and right binocular pixels with each other. The easiest way is to directly calculate the difference between the pixel values ​​of two points, but because it is only considered from one point, it is often easily affected by noise, so there is an optimization method to replace the point with a window, and calculate the difference between the pixel values ​​​​of the points in it. and. However, this algorithm based on pixel values ​​is very sensitive to light and distortion, resulting in certain errors. So there are some algorithms that are not based on pixel values, such as census algorithm and Rank transformation.

2.1 AD

  The AD algorithm can be said to be one of the simplest algorithms in matching cost calculation. Its main idea is to continuously compare the gray values ​​of two points in the left and right cameras. First, fix a point in the left camera, then traverse the points in the right camera, and continuously Comparing the difference between their previous gray levels, the gray level difference is the matching cost. Its mathematical formula is:
CAD ( p , q ) = ∣ IL ( p ) − IR ( q ) ∣ C_{AD}(p,q) = |I_L(p) - I_R(q)|CAD(p,q)=IL(p)IR( q )
where,ppp, q q q are two points in the left and right images respectively,IL ( ) I_L()IL( ) indicates the gray value in the left image, similarlyIR ( ) I_R()IR( ) represents the gray value in the right image. The above formula is the matching cost of a grayscale image; if it is a color image, the formula for calculating the cost of the AD algorithm is:
CAD ( p , q ) = 1 3 ∑ i = R , G , B ∣ I i L ( p ) − I i R ( q ) ∣ C_{AD}(p,q) = {1 \over 3}\sum_{i=R,G,B} |I_{i}^{L}(p) - I_{i} ^{R}(q)|CAD(p,q)=31i=R,G,BIiL(p)IiR( q )
is the average of the absolute values ​​of the differences between the three color components of the pixels in the left and right views.
  The AD algorithm is based on the matching cost calculated by a single pixel point, which is greatly affected by uneven illumination and image noise, but it has a better matching effect on texture-rich areas.

2.2 SAD

  SAD (Sum of absolute differences) is also the basic algorithm in the matching cost calculation. Its basic idea is: the sum of the absolute values ​​of the differences. Compared with the calculation cost of the AD algorithm, the gray value between two points is used, while the matching cost of SAD is determined by a certain point and its pixels within a certain range. The basic process is as follows: first input two
  images Image, a Left-Image, a Right-Image. Next, scan the left image sequentially to select an anchor point:
(1) Construct a small window, similar to a convolution kernel;
(2) Use the window to cover the image on the left, and select all pixels in the area covered by the window;
( 3) Also use the window to cover the image on the right and select the pixels in the covered area;
(4) Subtract the covered area on the left from the covered area on the right, and calculate the sum of the absolute values ​​of the gray differences of all pixels;
(5) Move the right For the window of the image, repeat (3)-(4) (here there is a search range, jump out beyond this range); (6
) Find the window with the smallest SAD value within this range, that is, find the best anchor point of the left image Matched pixel blocks.

Among them, the matching cost calculation formula of SAD is as follows:
CSAD ( p , q ) = ∑ m ∈ N p , n ∈ N q ∣ IL ( m ) − IR ( n ) ∣ C_{SAD}(p,q) = \sum_ {m∈N_p,n∈N_q} |I_L(m) - I_R(n)|CSAD(p,q)=mNp,nNqIL(m)IR( n )
where,N p N_pNp N q N_q NqRespectively represent ppp q q Pixels around q .

2.3 Cencus

  Census transform method is also widely used in matching cost calculation. It can better detect the local structural features in the image, such as edge and corner features. The basic idea is as follows: define a rectangular window in the image area, and use this rectangular window to traverse the entire image. Select the center pixel as the reference pixel, and compare the gray value of each pixel in the rectangular window with the gray value of the reference pixel. Pixels whose gray value is less than or equal to the reference value are marked as 0, and pixels greater than the reference value are marked as 1. , and finally connect them bit by bit to obtain the transformed result, which is a binary code stream composed of 0 and 1.
  The Cencus transformation process can be expressed by the following formula:
T ( p ) = ⊗ q ∈ N p ξ ( I ( p ) , I ( q ) ) T(p) = ⊗_{q∈N_p} ξ(I(p), I(q))T(p)=qNpξ ( I ( p ) ,I ( q ))
where p is the center pixel of the window,qqq is the pixels other than the center pixel of the window,N p N_pNpIndicates the central pixel ppthe neighborhood of p . I ( ∗ ) I(*)I ( ) means pixel point∗ * is the gray value at ∗, and ⊗ is the bitwise connection operation of bits. ξ ( ) ξ ( )The ξ ( ) operation is defined by the following formula:
ξ ( I ( p ) , I ( q ) ) = { 0 I ( q ) ≤ I ( p ) 1 I ( q ) ≥ I ( p ) ξ(I(p) , I(q)) = \left\{ \begin{matrix} 0 & I(q) ≤ I(p) \\ 1 & I(q) ≥ I(p) \end{matrix} \right.ξ ( I ( p ) ,Iq))={ 01I(q)I(p)I(q)I(p)
  Through the above formula, a series of binary numbers of Cencus transformation can be obtained. The matching cost calculation method based on Census transformation is to calculate the Hamming (Hamming) distance of the Census transformation values ​​of the two pixels corresponding to the left and right images, that is, the matching cost is:
CC encus = Hamming ( T ( p ) , T ( q ) ) C_{Cencus} = Hamming(T(p),T(q))CCencus=Hamming(T(p),T(q))

where T ( p ) T( p)T ( p ) is the binary string generated by the left picture,T ( q ) T(q)T ( q ) is the binary string generated in the right figure. The Hamming distance is the number of different bits corresponding to two bit strings. The calculation method is to perform an OR operation on the two bit strings, and then count the number of 1 bits in the XOR operation result.
  We can use the following figure to help understand the calculation method of the Cencus matching cost. First, calculate the binary string results of a certain range of Cencus transformations in the left and right figures (introduced above), and then XOR the obtained two strings of binary strings. Operation, the number of 1s in the result is the matching cost (the result in the figure below is 2).
insert image description here
  As we mentioned earlier, both the AD and SAD algorithms are more sensitive to light, but the Cencus transformation here is not sensitive to the light and shade changes of the picture, because the Cencus algorithm is a relative grayscale relationship, so even if the brightness of the left and right images is inconsistent, it can also be obtained. Better matching effect. However, the Cencus transformation does not perform well in matching repeated areas. For example, the gray levels in the two domain windows in the figure below are completely different, but the Cencus transformation obtains exactly the same cost.
insert image description here

3. Cost aggregation

  The fundamental purpose of cost aggregation is to allow the cost value to accurately reflect the correlation between pixels. After the matching cost is calculated, a cost matrix C (DSI) can be obtained, and the matrix C stores the matching cost value of each pixel under each parallax within the parallax range. At this time, the effect of the cost matrix C is relatively poor, because the cost calculation only considers the local correlation, and the cost value is calculated through the pixel information in a window of a certain size in the neighborhood of two pixels, which is easily affected by image noise. , and when the image is in a weak texture or repeated texture area, this cost value may not accurately reflect the correlation between pixels, and the direct performance is that the cost value of the real point with the same name is not the minimum.
  The cost aggregation is to establish the connection between adjacent pixels, and optimize the cost matrix with certain criteria, such as adjacent pixels should have continuous disparity values. This optimization is often global, and each pixel is in a certain The new cost value under a disparity will be recalculated according to the cost value of its adjacent pixels under the same disparity value or nearby disparity values, and a new DSI is obtained, which is represented by a matrix S.
  In fact, cost aggregation is similar to a disparity propagation step. Areas with high signal-to-noise ratios have better matching effects. The initial cost can reflect the correlation well, and the optimal disparity value can be obtained more accurately, and propagated to the signal-to-noise through cost aggregation. The areas with low ratio and poor matching effect finally make the cost value of all images accurately reflect the real correlation. Commonly used cost aggregation methods include scanning line method, dynamic programming method, path aggregation method in SGM algorithm, etc.

4. Parallax calculation

  The disparity calculation is to determine the optimal disparity value of each pixel through the cost matrix S after the cost aggregation, usually using the winner-take-all algorithm (WTA, Winner-Takes-All) to calculate, as shown in Figure 2, that is, a Among the cost values ​​under all disparities of pixels, the disparity corresponding to the smallest cost value is selected as the optimal disparity. This step is very simple, which means that the value of the aggregation cost matrix S must be able to accurately reflect the correlation between pixels. It also shows that the cost aggregation step in the previous step is an extremely critical step in stereo matching, which directly determines the accuracy of the algorithm.
insert image description here

5. Parallax optimization

  The purpose of disparity optimization is to further optimize the disparity map obtained in the previous step to improve the quality of the disparity map. There are mainly the following steps:

  • Left-right consistency detection: Eliminate false parallax caused by occlusion and noise
  • Error pixel classification:
  • Regional voting: Eliminate isolated outliers
  • Correct interpolation:
  • Edge correction:
  • Subpixel Enhancement: Optimizing for Subpixel Accuracy
  • Median filtering: smoothing the disparity map
      Since the disparity value obtained by the WTA algorithm is integer pixel precision, in order to obtain higher sub-pixel precision, further sub-pixel refinement of the disparity value is required. The commonly used sub-pixel refinement The optimization method is a one-dimensional quadratic curve fitting method, which fits a one-dimensional quadratic curve through the cost value under the optimal parallax and the cost values ​​under the left and right two parallaxes, and takes the parallax value represented by the minimum value point of the quadratic curve is the subpixel disparity value. as the picture shows.
    insert image description here

Guess you like

Origin blog.csdn.net/baidu_39231810/article/details/128631733