Three depth camera technologies: Secret of principle - binocular stereo vision

Why do we have to get depth with binocular camera?

Binocular visual depth camera workflow

The depth camera binocular stereo vision works in detail

    Ideal binocular camera imaging model

    Epipolar constraint

    Image correction technology

    Image matching based on sliding window

    The image matching based on energy optimization

Binocular stereo vision camera depth of the advantages and disadvantages

---------------------------------------------------

Based on binocular vision depth camera similar to the human eye, and based on the TOF, the principles of the structured light depth camera, it does not project outside the active source, entirely on two images captured (RGB color or grayscale) calculates the depth therefore sometimes referred to as passive binocular depth camera. More well-known products are STEROLABS launched ZED 2K Stereo Camera and Point Grey company launched BumbleBee.

ZED 2K Stereo Camera

Why do we have to get depth with binocular camera?

Here, some readers will ask it: Why do we have to get a camera with a binocular depth? I closed one eye only to observe with one eye, but also to know which object closer to me which is far away from me ah! It is not that a monocular camera can also get depth?

 

The answer: First of all, truly human eye can also be obtained by a certain depth information, but in fact there are a number of factors behind this easy to overlook at work: first, because the world in which the people itself is well aware of ( priori knowledge), and thus the size of everyday objects is a basic pre-judgment (from small to large multi-year vision training), according to near the far smaller really common sense can be inferred from what image we are far away from what we are near; Second, people in monocular observation object when in fact the human eye is shaking, the equivalent of a moving monocular camera, which is similar to the movement to restore the principles of structure (structure from motion, SfM), mobile monocular camera by comparing multiple frames differences can really get depth information.

 

But in fact, after all, not the human eye camera, it will only operate in accordance with silly pictures of people, not learning and thinking. The following figure shows the reason why the camera can not measure monocular and binocular depth value can be from physical principles. We see black dots on three different near and far below the red line projected on the camera in the same position, and therefore can not be resolved into a monocular camera, like in the end is still far from that point near that point, but they are at the top of the camera the projection was located in three different positions, so by observing the two cameras can be determined in the end is a point.

 

Binocular camera to determine the depth map

Binocular stereoscopic depth camera to simplify the process

Following is a brief summary about the depth measurement process binocular visual depth camera, as follows:

 

1, first need to binocular camera calibration, get inside and outside the parameters of the two cameras, Homography.

2, according to the calibration results correction on the original image, the two images corrected in the same plane and parallel to each other.

3, two of the corrected image pixel matching.

4, the depth of each pixel is calculated according to the matching result, thereby obtaining the depth map.

 

For more information, let me see detail below.

 

Binocular stereo vision camera depth details about

1

Ideal binocular camera imaging model

First, we start from the ideal situation analysis: Suppose two left and right cameras in the same plane (parallel to the optical axis), and the camera parameters (e.g., focal length f) is consistent. So the principles and formulas derived depth value as follows. Triangle formula is only related to the junior high school is similar knowledge, is not difficult to understand.

 

Calculation principle of binocular stereo vision camera depth value Ideally

 

According to the above derivation, the spatial distance of point P from the camera (depth) z = f * b / d, can be calculated if the depth Z found, need to know:

 

1, the camera focal length f, around the camera baseline b. These parameters can be obtained by a priori information or camera calibration.

 

2, the parallax d. You need to know the point (xr, yr) of correspondence between each pixel camera left (xl, yl) and the corresponding right camera. This is the core issue of binocular vision.

 

2

Epipolar constraint

So the question is, for a pixel in the left image, how to determine the location of the point in the right? We are not required to search for a matching carpet throughout the image?

 

The answer is: no. Because there are very constrained line (listening to the scary name). Epipolar line is very important for solving the correspondence between pixels in the image point.

 

So what is a very line on the road? As shown below. C1, C2 are two cameras, P is a point in space, P and center point of the two cameras C1, C2 forming a planar PC1C2 three-dimensional space, called polar planes (Epipolar plane). Planar electrode and two straight lines intersect at the two images, the two straight lines is called the source line (Epipolar line). Imaging point P in the camera C1 is P1, the point in the camera image is C2 P2, but the P position is not known beforehand.

 

Our goal is: to the left of the point P1, the corresponding point in the right look for it in P2, so that we can determine the spatial position of the point P, which is what we want from a space object and the camera (depth).

 

The so-called epipolar line (Epipolar Constraint) refers to when the same spatial point in two images are imaged, the left projection points p1 is known, then the corresponding right projection with respect to a certain point p2 p1 polar line, so It can greatly narrow down the matches.

 

The epipolar constraint is defined, we can visually see the following figure P2 in a certain line of the pole, so we only need to be able to find the search along the epipolar line corresponding points P1 and P2.

 

Epipolar constraint schematic

 

Attentive friends will find the above process considerations (two camera coplanar and parallel to the optical axis, the same parameters) is ideal camera C1, C2, if not in the same line how to do?

 

Indeed, this situation is very common, because some scenarios require two separate fixed cameras, it is difficult to ensure the optical center C1, C2 perfectly horizontal, even because of the assembly will be fixed on the same substrate results in incomplete optical center Level. As shown below. We see two polar cameras not only parallel, not coplanar, the ideal model before the set is derived not take the results, it can be supposed to do?

 

In the non-polar Ideally

 

Do not worry, there are ways. Let's take a look at the shooting in this case about two pictures of it, as shown below. Left three points cross symbol in the right is the corresponding source lines of the three right white lines, i.e. corresponding search area. We see this is not the level of three straight lines, point by point if the search efficiency is very low.

 

In the left three points (cross mark) on the right corresponding source line is a straight line right in three white

 

3

Image correction technology

How to do it? The ideal situation is not ideal without converting to OK! This is the corrected image (Image Rectification) technology.

 

The image is corrected by using two images, respectively homography (Homography) transformation matrix (available by calibration) obtained, the two images of the object plane is (gray in the figure plane) projected in different directions to the same back plane and the optical axes are parallel to each other (the plane of FIG yellow), so the model can be used in front of the ideal case, the two cameras has become the source line of a level.

 

Image correction schematic

 

After an image correction of the left pixel need only search for corresponding points along the source line can be a horizontal direction (happy). From the figure we can see that three points corresponding to the disparity (red double arrow line) are different, the smaller the parallax distant objects, the greater the parallax the closer the object, and this is consistent with our knowledge.

 

Results after image correction. The double arrow is red point corresponding to parallax

 

The above mentioned left for a point along it and find its best match pixel in the right direction in the horizontal source line, sounds simple, the actual operation is not easy. This is because the assumption is the ideal situation. When the pixels will match the actual conduct found several problems:

 

1, in fact, to ensure that the camera is two coplanar and parameter consistency is very difficult, and the calculation process will produce errors accumulate, so the left for a point, which corresponds to the point is not necessarily just the right pole on-line. But it should be very near the line, so the search range need be relaxed.

 

2, a single pixel is compared robustness is poor, it is susceptible to different illumination change and the impact angle of view.

 

4

Image matching based on sliding window

Solution to the above problems: matching is performed using a sliding window. As shown below. For a pixel (left center block red) in the left, left to right in the right within the same pixel size with a sliding window and it evaluates the similarity, the similarity measure there are many methods, such as square error method (Sum of squared Differences, referred to as the SSD), the more similar the two left and right windows in Fig smaller SSD. SSD lower curve in the figure shows the calculation results, the minimum pixel values ​​SSD corresponding to the position of the best match is found.

 

A schematic view of the principle of sliding window matching

 

Specific operations, there are many practical problems, such as a sliding window size. Select the size of the sliding window is still very luxurious. The following figure shows the effect of different sliding window size of the depth calculation results of FIG. We also find from the figure:

 

The small size of the window: higher accuracy, and rich details; it is particularly sensitive to noise

Large windows: accuracy is not high, not enough detail; but more robust to noise

 

Effect of different size sliding window depth calculation results of FIG.

 

Although calculated depth map matching method based on sliding window, but this effect is not good matching method, but also due to the sliding window pointwise matching, calculation efficiency is very low.

 

5

The image matching based on energy optimization

More mainstream methods are based on energy optimization to achieve matching. Energy optimization will usually define a function of energy. For example, for matching pixels in the two figures, the energy function we define the following Equation 1 in FIG. Our aim is to:

 

1, in the left pixel and the right of all the pixels corresponding the closer the better, is reflected in the image where the closer the better gradation values, i.e. the formula described in FIG. 2.

 

2, in the same picture, the two adjacent pixels disparity (depth value) should also be similar. That is the description of FIG. 3 equation.

 

Energy function

 

Energy function represented by the above formula 1 is the famous MRF (Markov Random Field) model. By minimizing the energy function, we finally got one of the best matches. With the matching result of each pixel in the left and right in FIG., The foregoing formula can be obtained depth depth value for each pixel, to give a final depth map.

 

Advantages and disadvantages of binocular stereo method

According to the principles of the previous introduction, we summarize the advantages and disadvantages based on binocular stereo vision method depth camera.

 

1

advantage

1, the camera low hardware requirements, low cost. As there is no need to use a special transmitter and receiver and the structured light image TOF, using ordinary consumer can RGB camera.

 

2, indoor and outdoor are applicable. Since the light capture images directly from the environment, it can be used indoors and outdoors. In contrast, TOF and the basic structure of the light for indoor use only.

 

2

Shortcoming

1, it is very sensitive to ambient light. Binocular stereoscopic image capture natural light in the environment-dependent method, and the influence of environmental factors illumination angle changes, such as changes in light intensity, brightness difference between two pictures taken will be relatively large, this will have a great challenge proposed matching algorithm . Below is taken under different lighting conditions Image:

 

Image contrast under different light

 

In addition, strong light (there will be overexposed) and under darker circumstances can lead to a sharp decline algorithm results.

 

2, does not apply to the lack of texture monotonous scene. Due to binocular stereo vision image matching method according to visual features, so visual features for the lack of a scene (such as the sky, a white wall, desert, etc.) will be difficult to match, resulting in even greater matching error matching fails.

Richly textured (left) and the lack of texture scene (right)

 

3, high computational complexity. This method is purely visual method, you need to calculate pixel-by-match; and because the impact of these various factors, the need to ensure a robust matching results of the comparison, so the algorithm will increase the number of false rejects strategy, so the algorithm are higher, think to achieve reliable commercial difficult, large amount of calculation.

 

4, the camera limits the baseline measurement range. And baseline measurement range (two cameras pitch) significant relationship: larger baseline, farther measuring range; baseline smaller, the closer the measuring range. So the baseline to a certain extent limit the depth of the camera's range.

 

Published 377 original articles · won praise 145 · views 210 000 +

Guess you like

Origin blog.csdn.net/Windgs_YF/article/details/104617755
Recommended