[译]Multi-View Stereo: A Tutorial(1)

Summary:

The tutorial is MVS areas to focus on practical algorithms practice manual, MVS algorithm relies only on the image, based on some reasonable assumptions (such as?) Reconstruct accurate 3d model of the real. The most important is the scene fixed. The tutorial will mvs problem into an image / detail geometric constraint optimization problems mainly in two ways: 1. Robust achieve consistency of image detection; 2 effective optimization algorithm...
mainly about how these two factors used in industrial applications and in this tutorial. also described are methods involve high-level expertise in areas such as: structural optimization, and the next challenge and future research directions.

1 Introduction

1.1 image acquisition
order-disorder

1.2 Projection camera model
as described in the introduction, in order to make better reconstruction, the MVS algorithm needs additional information, in particular each camera image corresponding to the model - which describes how the projected 3D point corresponding to the 2D space, the MVS algorithm often using a pinhole camera model, its camera projection matrix is a 3X4 matrix [88], in proportion to the definition of which is generally used for taking still pictures of ready-made digital camera models. Any of a 3X4 matrix can be decomposed into the upper triangular matrix K 3x3 3X4 and the posture matrix product [R | T].

K --- internal reference matrix camera
(fx, fy): Vertical / horizontal focal length
(cx, cy): principal point
s: twist distortion

[R | T] --- external reference matrix
R: rotation parameter
T: translation parameter

Internal reference matrix K is a matrix camera, which is composed by the camera's internal reference, vertical, horizontal focal point (optical center) length (F X , F Y ), the principal point (C X , C Y ), the distortion parameter s. [R | T] matrix is a matrix of reference outside, R is the camera rotation matrix, T is the translation matrix camera. Since the quality of the camera sensor 11 rarely estimated parameters of the camera projection matrix, i.e., generally free of distortion assumption s = 0, i.e., the sensor is a square fx = fy, when the image is not cropped in the camera main center point, so the ordinary needle hole camera parameters from the camera focal length f, the rotation matrix R and translation matrix T is composed of seven parameters;

For additional lens or wide-angle camera imaging effect is not good (the left in Figure 1.4), a simple pinhole camera model is insufficient to represent camera model, usually to add a radial distortion. Especially for a high resolution image radial distortion it is important because a slight deviation will involve a plurality of boundary pixel values.
radial distortion generally before entering the process to eliminate MVS algorithm, if the radial distortion parameters have been estimated, by resampling processing on the image distortion trans eliminate distortion, like the acquisition by non-ideal lens distorted image (Figure 1.4 lower left). MVS can be simplified to eliminate image distortion algorithms, calculation time shorter. Some cameras, such as camera phones or dedicated hardware to handle distortion after image acquisition. Note the corrected wide-angle image sampling and vision need to re-cut, in order to avoid these problems mvs algorithms required to support radial distortion and more complex camera model, which adds extra complexity.
Finally, another reason for the rolling shutter is particularly complex model, it is important (see Figure 1.4 right) for video processing applications. Digital sensor of the rolling shutter with an exposure time of each row of the image is slightly different. This is in contrast with the global shutter, the global shutter is simultaneously expose the whole image. Rolling shutter is typically higher at higher throughput sensor which camera model more complicated. Thus, when the captured image if the camera or scene moves, each line of the image scene captured slightly different. If the camera or slow motion scenes. wrt shutter speed is fast, the rolling shutter effect is small enough to be ignored. Otherwise, the camera shutter projection model requires the integration effect [63].

1.3 Structure from Motion
There are many articles on the Structure from Motion algorithm, the purpose of this chapter is not considered in detail the issue, then we will discuss some of the key points SFM algorithm and its relationship with the MVS algorithm.
SFM Algorithm input series of images, and an image output of the camera parameters of each image in the 3D points, commonly referred to as trace points, trace points are typically reconstructed 2D coordinate their spatial coordinates and 3D points on a corresponding image. SFM present algorithm the basic process shown in Figure 1.5:

2D is calculated for each image feature point
2D feature point matching between the image
Construction of 2D tracks from matching relationship
Sfm solution models from 2D tracks
bundle adjustment model optimization sfm

Initial work focused on constructing SfM geometric features in view under the assumption that the two rigid three views the scene and [88]. Carlo Tomasi visual reconstruction algorithm [182] is the prototype of the early work. One key development is the use of SfM RANSAC [61] epipolar geometry between two robust estimation three views and view matching in noisy.

Then, the results focused on two key parts SfM algorithm: 1) Euclidean reconstruction from multiple cameras (Zoom), which estimates camera parameters and 3D position of the track points; 2) build a longer track 2D point, 20th century, SfM algorithm calculation model can be obtained from a large set of images robustly structured, for example, from a sequence of images or video sequences [62,152]. The first SfM Industrial Solutions began commercial applications, for example in the field of film editing and special effects [4].

Initially these systems are mainly as a structured set of design image, i.e. an image sequence is important, for example, a video sequence. While some MVS applications structured sequence can be achieved, for example, Google StreetView [81] or Microsoft Streetside [143], many recent MVS applications also use at different times, different hardware acquisition, unordered set of images, for example, aerial images of 3D maps [108,144,30]. As the quality of high-speed fast feature detector [87,135,57] and descriptors [135,36,159,130,26] development so SfM unstructured data sets can be applied. The quality descriptor longer build higher-quality tracks from different point shooting posture and lighting possible.

The final element to address the large-scale unstructured photo SfM is to improve the matching stage. For unstructured photo collections, people without any prior knowledge about the nearby candidate should match the image. Thus, each image must be one matched with other images, i.e., computationally expensive. Valid index [146] described in conjunction with high-quality active sub allows millions of pairs of matching images. Trace point communication simplified graph [172] and parallelization [25,64] is further used in the industry resulting in t state-of the-art process of SfM, e.g., Microsoft photosynth [16] and Google photo tours [15] ( See Figure 1.6).

1.4 Bundle Adjustment

Algorithm: RANSAC calculated from the matching in noisy epipolar geometry
1> --- multi-camera Euclidean reconstruction parameter estimates and tracks 3D camera point
2> Construction of longer tracks

Although the algorithm is not a part of BA algorithm SFM, SFM initial model optimization algorithm is a common step, a series of camera parameters {Pi} and a trace point s {Mj, {mji}}, Mj 3D coordinates of the representative point trajectory, Mij represents 2d projected image coordinates of the i-th camera. BA nonlinear least squares algorithm for minimizing the error

Where: v (j): is the point Mj visible index list camera
pi (mj): indicates the camera 3D point Mj in the 2D projection coordinates camera i and camera parameters Pi is,
E (P, M): generally square pixels measurement, but more commonly used metric is expressed in root mean square error or RMSE estimation accuracy, measured in units of pixels, is defined as follows:

Alt text

Where N is the number of remaining items (1.2) and. Typical RMSE values before the algorithm BA is about a few pixels, and the value of BA after optimization are generally sub-pixel.
BA frame supports a plurality of sensors in combination with SfM target. A method for fusion SfM GPS and IMU data are simply added to the individual items of Formula (1.2), the deviation from the prediction penalty camera model with GPS and IMU signals Pi.

MVS algorithm is very sensitive to the accuracy of the camera model estimation. The reason is that, for efficiency purposes, they will be matched by the very use of geometric lines camera model definition 2D problem into a one-dimensional matching problem (see Section 1.5 for more details). If large reprojection error, the pixel may never match its real compared to significantly reduce the MVS performance. MVS depends on the robustness of the camera reprojection error is mainly about how to tolerate matching criteria (ie, photo consistency measures proposed in Chapter 2) whether or not misplaced. In general, the picture of consistency metric domain Ω greater (see Formula 2.1), the more robust measurements. Unfortunately, a large domain also tend to produce excessive smooth geometry, so there is a compromise between the accuracy and robustness.

Since very sensitive to MVS reprojection errors, so the algorithm is usually required BA MVS, the target sub-pixel reprojection error. Note that since the re-projection error is measured in pixels, and can be sampled at the input image and re-scaling camera parameters until reprojection error falls below a certain threshold. As long as this still contains downsampling images, the method will work enough texture and detail to make MVS play a role [72].

1.5 Multi-View Stereo
origin of multi-view stereoscopic human stereo vision can be traced back, and the first attempt to solve the stereo matching problem by calculating it as a problem [139]. Until today, the dual-view stereo algorithms has been a very mature very active area of research [162]. The multi-view three-dimensional geometry of origin perspective view of a dual nature improved geometry. The multi-view three-dimensional geometry is not captured from two different viewpoints of the two pictures, but the photo capture more intermediate viewpoint to increase the robustness, such as image noise or surface texture [184,147], originally an improved dual-view stereo the method, now evolved into a different type of problem.

Although MVS with this classic stereo algorithms have the same principles, MVS image processing algorithm designed to more varied viewpoints, such as around a set of images of the object, and also handle very large amounts of images, even in the millions of orders. MVS difference in nature of the problem will eventually produce a different algorithm stereo classical counterparts as an example, 3D graphics for industrial applications [108,144,30], deal with millions of pictures once kilometers, effectively rebuilding metropolitan area, state, eventually the whole world.

Matching pixels in the image is a challenging problem unique stereo or multi-view stereo. In fact, the visual field of the optical flow is another matter very active computer, solves the problem of image density corresponding to [33]. The main difference is that MVS optical flow problems are typically two images (two views similar to stereo), uncalibrated camera, which main application is not a 3D reconstruction image interpolation. Note that, in the case of MVS, the camera parameters are known, the 3D geometry of the scene to solve exactly corresponding to solve the problem in the input image. See why, consider belong 3D scene 3D point (see Figure 1.7 left) geometry. The visible projected 3D point group established on the camera unique correspondence between projected coordinate in each image. A given pixel in the image, the image needs to find the corresponding pixel in the other two components of the pixel:
generating a possible candidate effective method of image pixels in other areas.
• metrics measure whether a candidate matches the correct possibilities.

If you do not know the geometry of the camera, generally is the case in the light stream, each pixel in the image can be matched to any other pixel in the other image. That is, for each pixel, it must be carried out in another 2D search image pixel. However, when the camera parameters is known (and the scene is rigid), matching one-dimensional image search (see illustration 1.7) from the search simplified 2D. Pixels in the image and generates the pixel a3D light through the camera image center. On the other of the respective pixels located on the image light is projected only to the second image. The same 3D scene a plurality of different cameras seen from different viewpoints geometric constraints is referred to as epipolar geometry [88]. As for measures to determine the possibility of matching candidates, there is a vast literature on how to build the so-called photo consistency measures estimate the probability of two pixels (or groups of pixels) in the communication. Photos in the background MVS compliance measurements are described in more detail in Chapter 2.

Guess you like

Origin www.cnblogs.com/baozhilin/p/11415698.html