3D Vision Depth Camera Solution

With the gradual development of disruptive technologies such as machine vision and autonomous driving, there are more and more related applications using 3D cameras for object recognition, behavior recognition, and scene modeling. It can be said that 3D cameras are the eyes of terminals and robots.

3D camera

A 3D camera is also called a depth camera. As the name suggests, the camera can detect the depth of field distance of the shooting space, which is the biggest difference from ordinary cameras. The pictures taken by ordinary color cameras can see and record all objects within the camera's viewing angle, but the recorded data does not include the distance of these objects from the camera. Only through the semantic analysis of the image can we judge which objects are farther away from us and which ones are closer, but there is no exact data. The 3D camera just solves this problem. Through the data obtained by the 3D camera, we can accurately know the distance between each point in the image and the camera. In this way, adding the (x, y) coordinates of the point in the 2D image, we can Get the 3D spatial coordinates of each point in the image. The real scene can be restored through three-dimensional coordinates, and applications such as scene modeling can be realized. 

As can be seen from the above description, our eyes are a natural 3D camera, which is mainly due to the principle of polarization. When the human eye is looking at any object, since the two eyes have a certain distance of about 5cm in space, there are two perspective. In this way, the images seen by the left and right eyes are not exactly the same, which is called parallax. This kind of subtle parallax is transmitted to the brain through the retina, which can show the front and rear distances of objects and produce a strong three-dimensional sense. This is a wonderful phenomenon discovered by British scientist Winterston in 1839. Naturally, human beings think of obtaining 3D data by simulating human eyes through two ordinary cameras at a certain distance. This is the original binocular stereo camera. In view of some defects of the binocular camera, people later invented the method of structured light and TOF to obtain 3D data. There are three common 3D camera solutions currently on the market: time-of-flight (TOF), RGB binocular and structured light method

(1) Structured-light, representative companies include Orbi Zhongguang, Apple (Prime Sense), Microsoft Kinect-1, Intel RealSense, Mantis Vision, etc.

(2) Binocular vision (Stereo), representing companies Leap Motion, ZED, DJI;

(3) Optical time-of-flight (TOF), representative companies Microsoft Kinect-2, PMD, SoftKinect, Lenovo Phab.

The picture below is a simple and vivid picture introduction of these three methods
 

 

 

 

 

 

structured light


Structured light, which is called Structured light in English, usually uses an invisible infrared laser with a specific wavelength as the light source. The light emitted by it is projected on the object through a certain code, and the distortion of the returned code pattern is calculated by a certain algorithm to obtain the object’s image. position and depth information. According to different coding patterns, there are generally stripe structured light---enshape, coded structured light---Mantis Vision, Realsense(F200), speckle structured light---apple(primesense), Obi Zhongguang. The following figure is a schematic diagram of a typical structured light camera:
 

Apple's IphoneX adopts the technology acquired from primesense, which is also speckle structured light. The so-called speckle is the random diffraction spot formed after the laser irradiates rough objects or penetrates ground glass. These speckles are highly random and change patterns with distance. That is to say, the speckle patterns of any two places in the space are different. As long as such structured light is applied in the space, the entire space will be marked. Put an object into this space, and just look at the speckle pattern on the object to know where the object is. Of course, the speckle pattern of the entire space must be recorded before this, so a light source calibration must be done first. By comparing the spot distribution of the calibration plane, the distance between the current object and the camera can be accurately calculated.
 

 

With the continuous development of structured light 3D measurement technology, the practical application value continues to increase. In the field of industrial measurement, in the 1990s, Dr. Steinbichler, Dr. Wolf and Professor Reinhold Ritter established Steinbichler GmbH, Dr. Wolf GmbH and GOM GmbH respectively, marking the commercialization of structured light measurement technology
. At present, the Atos series of GOM and the Cobalt series of structured light of FARO, which are representative of foreign structured light, are widely used in the automotive, aviation and consumer goods industries. In recent years, domestic structured light industrial products have also been developing continuously, among which the representative Beijing Tianyuan OKIO series products are also pouring into the fiercely competitive market, as shown in Figure 1.3, in the figure (a), (b),© Respectively represent the structured light products of FARO, GOM and Beijing Tianyuan.

insert image description here

 

In the field of entertainment consumption, laser speckle structured light, as a spatially coded structured light, has been particularly prominent in recent years. In June 2011, Microsoft released Kinect V1, which has already been used in somatosensory games and robot vision. In September 2017, the iPhone X released by Apple also adopted the speckle structured light provided by PrimeSense, which opened the three-dimensional information measurement of the face on the mobile phone platform. The domestic speckle structured light representative company Obi Zhongguang’s ASTRA series is also used in mobile devices, 3D face recognition, robot vision and other fields, as shown in Figure 1.4, (a), (b), and © in the figure represent Apple’s The structured light products of the company, Microsoft Corporation and Aobi Zhongguang.

insert image description here

The binocular vision technology of structured light is a three-dimensional reconstruction technology of active measurement. Irradiate the plane light beam on the surface of the object to form deflection information that is inconsistent with the original light band, and solve the depth information through decoding, so as to accurately and quickly obtain three-dimensional depth information. Projection surface structured light, that is, project a group of parallel structured light, and at the same time place the object on a worktable that can be rotated at any angle, process the three-dimensional information on different planes in batches, calculate the depth information and perform matching and fusion, and can reconstruct the object Overall 3D information.

Figure 1.5 shows an industrial-grade 3D scanner [23] model OKIO-H, which uses an imported high-precision industrial 1.31 million-pixel CCD sensor, and is suitable for precision equipment manufacturing fields such as aerospace, wind power and hydropower.

insert image description here

 

The structured light method has important applications in industry, such as detecting workpieces from conveyor belts, and reverse
engineering ; it also has important applications in graphic modeling, such as human body modeling, including body models such as heads, and modeling of sculpture objects. Digitization, in fact, the basic principle of the 3D scanner is also transformed on the basis of the structured light method. The projected coding pattern is projected onto the surface of the object by the projector, and then collected by the camera. Whether the encoding pattern is appropriate or not will directly affect the accuracy and efficiency of 3D measurement and reconstruction. The dual-mode handheld 3D laser scanner [24] shown in Fig. 1.6 greatly simplifies the process of 3D reconstruction.

insert image description here

 

Classification of structured light depth cameras

      Mainly divided into monocular structured light and binocular structured light cameras.

       Monocular structured light is easily affected by light. In an outdoor environment, if it is sunny, the coded light spots emitted by the laser are easily drowned by sunlight. Binocular structured light can use structured light to measure depth information in an indoor environment, and switch to a pure binocular method when outdoor light causes structured light to fail. Great room for improvement. In addition, the laser in the structured light solution has a short service life, and it is difficult to meet the long-term work requirements of 7*24 hours, and it is easy to be damaged when it works continuously for a long time. Because the monocular lens and the laser need to be precisely calibrated, once damaged, it is very difficult to re-calibrate the two when replacing the laser.

The principle of monocular structured light 

      Usually an invisible infrared laser with a specific wavelength is used as the light source. The light emitted by it is projected on the object through a certain code, and the distortion of the returned code pattern is calculated by a certain algorithm to obtain the position and depth information of the object. According to different coding patterns, there are generally stripe structured light enshape, coded structured light Mantis Vision, Realsense (F200), speckle structured light apple (primesense). Because structured light actively projects coded light, it is ideal for use in poorly lit (or even no-light) scenes that lack texture. The structured light projection pattern is generally carefully designed, so it can achieve high measurement accuracy within a certain range. The technology is mature, and the depth image can achieve relatively high resolution.

The principle of binocular structured light 

      Binocular structured light is a solution combining pure binocular and monocular structured light. Structured light is equivalent to adding texture to objects, and the principle of parallax is still used to calculate depth.


Major manufacturers

     Foreign companies developing structured light solutions include Prime Sense (Apple), Intel, and Israel Mantis Vision (Xiaomi);

     Companies that develop domestic structured light solutions include Shenzhen Obi Zhongguang Technology Co., Ltd. (monocular structured light OPPO), Nanjing Huajie Aimi Software Co., Ltd. (monocular structured light), Qiutai Technology, and Shanghai Tuyang Information Technology Co., Ltd. (Binocular structured light).

 

 

The main advantages of structured light (speckle) are:

1) The solution is mature, and the camera baseline can be made relatively small, which is convenient for miniaturization.

2) The resource consumption is low, the depth map can be calculated from a single frame IR image, and the power consumption is low.

3) Active light source, which can also be used at night.

4) High precision and high resolution within a certain range, the resolution can reach 1280x1024, and the frame rate can reach 60FPS

The disadvantages of speckle structured light are similar to structured light:

1) It is easily disturbed by ambient light, and the outdoor experience is poor.

2) As the detection distance increases, the accuracy will deteriorate.
 

 Binocular vision


Binocular stereo vision (Binocular Stereo Vision) is an important form of machine vision. It is based on the principle of parallax and uses imaging equipment to obtain two images of the measured object from different positions. By calculating the position deviation between corresponding points of the image, A method to obtain the three-dimensional geometric information of the object. At present, there are active binoculars and passive binoculars. Passive binoculars use visible light. The advantage is that no additional light source is required, but they cannot be used at night. Active binoculars actively emit infrared lasers as supplementary light, so that they can also be used at night.

Binocular vision only relies on images for feature matching, and has low requirements for additional equipment. Before using the binocular vision camera, the positions of the two cameras in the binocular must be accurately calibrated. The picture below is a good and simple illustration of how the binocular camera obtains the depth data of the object. The 3 points on the same line are all projected to the same point in cmos by the camera at the lower end, so this single camera cannot distinguish the distance of the three points, but the projection positions of these 3 points on the upper camera are different, through triangulation method and the baseline distance B of the two cameras, we can calculate the distance between these three points and the camera plane.
 

Of course, the complete binocular depth calculation is very complicated, mainly involving the feature matching of the left and right cameras, and the calculation will consume a lot of resources.

The main advantages of binocular cameras are:

1) Low hardware requirements and low cost. Ordinary CMOS camera will do.

2) Suitable for both indoor and outdoor use. As long as the light is right, not too dim.

But the disadvantages of binocular are also very obvious:

1) Very sensitive to ambient light. Changes in light lead to large image deviations, which in turn can lead to matching failures or low accuracy.

2) It is not suitable for scenes that are monotonous and lack texture. Binocular vision performs image matching based on visual features, and no features will cause matching failure.

3) High computational complexity. This method is a purely visual method, which has high requirements on the algorithm and a large amount of calculation.

4) The baseline limits the measurement range. The measurement range is directly proportional to the baseline (distance between two cameras), making it impossible to miniaturize


 

Optical time-of-flight (TOF)


As the name implies, it measures the time of flight of light to obtain the distance. Specifically, it continuously emits laser pulses to the target, then uses the sensor to receive the reflected light, and obtains the exact target distance by detecting the flight time of the light pulse. Because of the speed of light laser, it is actually not feasible to measure the time-of-flight directly, and it is generally realized by detecting the phase shift of the light wave modulated by certain means.

According to different modulation methods, the TOF method can generally be divided into two types: pulsed modulation (Pulsed Modulation) and continuous wave modulation (Continuous Wave Modulation). Pulse modulation requires a very high-precision clock for measurement, and needs to emit high-frequency and high-intensity laser light. At present, most of them use the method of detecting phase shift to realize TOF function.

The picture below describes the basic principle of the TOF camera (continuous wave). In practical applications, sine wave modulation is usually used. Since the phase offset of the sine wave at the receiving end and the transmitting end is proportional to the distance of the object from the camera, the phase offset can be used to measure the distance.
 

Because TOF is not based on feature matching, the accuracy will not drop rapidly when the test distance becomes longer. At present, unmanned driving and some high-speed

Consumer lidars on the end are basically implemented using this method.

The main advantages of TOF are:

1) The detection distance is long. In the case of sufficient laser energy, it can reach tens of meters.

2) Less interference from ambient light.

But TOF also has some obvious problems:

1) High requirements on equipment, especially the time measurement module.

2) Large resource consumption. This scheme needs multiple sampling integrations when detecting the phase offset, and the computation load is large.

3) The edge precision is low.

4) Limited to resource consumption and filtering, the frame rate and resolution cannot be high. At present, the largest consumer category is VGA.

in conclusion


From the perspective of the above three mainstream 3D camera imaging solutions, each has its own advantages and disadvantages, but from the perspective of practical application scenarios, structured light, especially speckle structured light, is the most widely used in the non-autonomous driving field. Because in terms of accuracy, resolution, and the scope of application scenarios, neither binocular nor TOF can achieve the greatest balance. Moreover, structured light is easily disturbed by ambient light, especially sunlight. Since this type of camera has an infrared laser emission module, it is very easy to transform it into an active binocular to make up for this problem.     
 

 Of course, these three solutions also have some mutual integration trends in the development process, such as active binocular + structured light, learning from each other, so that 3D cameras can adapt to more scenes. There are also some simultaneous use, such as the basic confirmation of the front of the mobile phone will use structured light for FaceId, but the rear is used for AR applications, both structured light and TOF have opportunities. Although which solution to use in the project depends on the current hardware resources and performance requirements, etc., but from the perspective of the widest use, speckle structured light is undoubtedly the best solution at present.
 

 

 

————————————————
 https://blog.csdn.net/weixin_44470443/article/details/94861813

Vehicle Binocular ADAS (6): Structured Light Depth Camera_bobuddy's Blog-CSDN Blog

Overview of structured light 3D scanning - Programmer Sought 

Why study "Binocular Structured Light"? _Orange Wu's Blog-CSDN Blog

Guess you like

Origin blog.csdn.net/qq_42672770/article/details/130291844
Recommended