[译]Multi-View Stereo: A Tutorial(3)

chapter 3 based on the three-dimensional image reconstruction consistency

Image consensus algorithm based on the second chapter, this chapter will explain the details of recent years, the popular multi-view three-dimensional geometric algorithms. There are many factors MVS algorithm to distinguish, for example, image coherence function, the expression of the scene, computing and visualization initialization condition. Therefore proposed a separate classification is not easy. This article will representation as the main basis for classification scene output. Because it determines the scene application, interested readers can refer to [165] View mvs classification algorithm.
Fig3.1 represents four common representation: a depth map, the point cloud, voxel fields, Mesh patch, each chapter reconstruction algorithm will introduce state-of-the-art methods. Point cloud reconstruction method is based rendering rendering point [160,83], it shows a complete model of the texture rendering, but simply point cloud model independent colored 3D points; voxel commonly used in the field of computer vision 3D computer graphics plane expressed and, often voxels field as a function of distance from the plane of the strip symbol, the zero potential plane is a function of the surface field.
Reconstruction step Fig3.2 algorithm and MVS represent intermediate or final geometry types, many algorithms MVS focused on a single reconstruction step, however, some combinations of a plurality of steps to operate the line, this table expressed most MVS algorithm / system, in addition to a algorithms - Construction mesh directly consistency voxel image, voxel by fusion method [190,102], in this method, the image replaces the consistency point cloud voxel depth or FIG.
Of course, there are many algorithms developed in the past is no longer listed out here, for example, level-set method, level-set once very popular in the MVS algorithm, because it can handle topology changes [58], a typical step is to initialize the model and then rebuild optimized initialization topology may be incorrect, but level-set is no longer used because of better initialization or reconstruction algorithm (see chapter) have been developed. High-quality model and the correct topology can be directly obtained by the consistency of the image, so you do not need optimization process. Many earlier same visual hull algorithm initialization mesh shape, However, the latest no longer used, because the shape of the extraction process often requires manually extracted, visual hull approximation is no longer valid, or for the case where there are many concave structures are no longer effective, better initialization reconstruction technology have also accelerated phase-out approach. [Note automatic extraction method aspect, see 47,49,50].

avatar
MVS output algorithms can be used to classify a scene representation, four popular depth map representation, point cloud, voxel fields and mesh patch, attention dense mesh point cloud model looks like textured, although only some of the points 3d cloud out from MVS exemplary reconstruction state of art in order from top to bottom [48] [74] [94] [93]

*** *** 3D expression and application of
Table 3.1 summarizes the four scenarios analyzed the feasibility of representation in three popular applications, 3D reconstruction of the main application is in graphics (visual), while table lists two different visualization applications. To render the camera view changes rendered images based on texture mapping view dependency, this technique produces immersive visual experience, since the rendering based mainly on real-time image, and can transfer complex breadth effects such as specular light or transparent translucent, these are it is difficult to simulate the [54,68,170], Google Street view [81] is a good example of texture mapping based on the view, however, in order to avoid rendering artifacts, rendering the camera must be close to the input image. Rendering camera range of motion severely limited by the input photo coverage. Depth map texture mapping expressions of view and effective, because it is possible to optimize the geometry of each of a perspective view of rendering [127] the sky modeling visualization very challenging for outdoor scene, since it is difficult to describe the geometry is not easily reconstruction, the depth map in order to achieve better rendering effects may be generated for each agent geometrical perspective view of the Mesh, the point cloud, voxel is not easy, since they are independent of the view.
On the other hand, free viewpoint rendering operation to move freely in space applications for navigation purposes and browse more friendly, google map is a good example, but rendering generally independent view and the lack of realism. For free points rendering method, mesh or point cloud is more appropriate, MVS mesh texture mapping method has been successfully applied in urban outdoor visualization products [30,108,144]. Point-based rendering in computer graphics technology has been applied to [83], the visual quality of the depth map or the cloud point (point clouds may also be [69,117,104]) produce a high quality visual results. However, for the MVS point cloud generated very little work has focused on the point cloud based rendering method, MVS point cloud noise and often reconstructed out of the hole, rendering quality will be severely reduced.
The final application is geometric operations. With the development and reconstruction of the MVS technology scene increasingly larger and more complex, geometric operations function more and more important. To complete the model of a scene, a plurality of processing MVS reconstruction results is necessary. For this task mesh representation there is a big challenge way, because to control the topology of the grid by merging and splitting very difficult, so there will be a geometric operation.
In Figure 3.2, the bottom polishing voxel (volume scalar field) and a mesh grid of FIG. But this may not be the goal of every MVS system. For example, if a view-dependent texture mapping on the
application, should simply select a depth reconstruction algorithm of FIG. If a free viewpoint rendering applications, it can be reconstructed from the point cloud image, then you can use the point-based rendering applications, without having to run any other steps in the figure. Of course, high polygon mesh grid model scene represented generally preferred, and are attributed to all the processing in FIG reconstructed grid.

*** Evaluations ***

MVS researchers conducted a quantitative assessment in order to verify the accuracy of the MVS algorithm [165,176]. Seitz, Curless, Diebel,
Scharstein Szeliski and 2006 laid the foundation for the quantitative evaluation MVS [165], the evaluation having a low resolution (640 × 480) evaluate two MVS algorithm object image data sets, the data sets It is carefully in a lab environment with a certain lighting in. This evaluation is called Middlebury MVS assessment. Although the advantages of using low-resolution image may not reflect the modern consumer market in the presence of high-resolution digital camera, but it has a minimal impact error of calibration: higher image resolution requires more accurate and repeatable mechanical devices (such as robots arm). A few years later, Strecha, Hansen, Van Gool, Fua and Thoennessen released a complementary MVS baseline data collection and evaluation systems, focusing on outdoor scenes and high-resolution input image, which reflects the trends and needs MVS study [176]. Many algorithms it is impressed in the reconstruction accuracy (e.g., within 20cm Volume of 640 × 480 image 0.5mm precision), and also produces a dramatic 3D models, including all optimal algorithms described in this chapter.
A lack of visual quality assessment is to rebuild the model. Middlebury MVS evaluation revealed the fact that pure geometry quantitative indicators do not always reflect the visual quality of the model. In other words, the model having a clear visual artifacts sometimes better geometric accuracy. Recent MVS algorithm produces high-quality 3D visual model, not only geometric precision [164,167]. MVS future assessment should also consider the geometric accuracy and visual quality.
Now we provide four output scene represented MVS detailed reconstruction algorithm details.

3.1 depth map reconstruction

Due to the flexibility and scalability, depth map represents the scene is one of the most popular choices. Suppose a person is given thousands of images and camera parameters as input. One can easily image reconstruction for each input depth map may be found after a small amount of adjacent images together for photographs conformity assessment. Regarded as the depth map by a 2D array of 3D points, the depth map may be regarded as a plurality of combined 3D point cloud model. The process is simple and can be easily extended to a large number of images.

MVS reconstructed depth map is usually carried out at a narrow baseline assumptions, the formula with the conventional two stereo view of the same [162]. The method uses a set of image
camera parameters, an effective depth of a finite set of discrete depth values, and then reconstructed 3D geometric reference image.
For simple and compact objects, sampling uniform depth may be sufficient.
However, for large and complex scenes, the appropriate sampling program is essential for achieving high speed and high quality. Researchers have proposed perspective correction of sample depth or number of parametric
values, details, see the paper [203,104,75]. However, often the MVS depth map reconstruction algorithm is simpler than their counterparts two stereo views, as there is usually a
more images, such that more redundant. In other words, in the context of the MVS, the integrity of the single depth map is not critical as long as the combined model is accurate and complete.
In the remainder of this section, we will detail a few typical MVS depth map algorithm is described, and then talk about some advanced technology.

3.1.1 winner-takes-all depthmaps (winner-take)

Suppose a given depth necessary to calculate the reference image, and a set of neighboring image contains a depth value range needs to be reconstructed scene. A simple depth map image reconstruction algorithm is to evaluate the consistency value over the entire depth of each pixel of the image is independently selected depth consistency highest score value, which is referred to as "winner takes all" approach, shown in Figure 3.3.
NCC wherein the image uniformity measurement method, and expected to have a maximum value at the correct depth. Algorithm1 given a complete description of the algorithm. In addition to having the highest depth value of the image uniformity, the algorithm often assess confidence measure, the low-confidence depth value after the merging step model [106] can be ignored or reduced weight. This simple algorithm first by Hernández and Schmitt [93] to prove and surprisingly good results. The algorithms have been various improvements, then we turn our attention to more sophisticated methods.

3.1.2 Robust Photo-Consistency Depthmaps (robustness)

While Algorithm 1 runs fairly well, but in general it does not guarantee that the window was the only match on the surface of the object. Larger window size may lead to more unique matching, however, the peak area corresponding to not more easy to locate, reduces the accuracy of the depth estimation. And shielding effects such as non-Lambertian luminous intensity specular highlights add noise to the photo consistency function. Thus, simply using the average value of the formula (2.3) may not work (see Fig. 3.4). Vogiatzis, Hernández, Torr and Cipolla [190] presents a powerful image consistency features to overcome these challenges. In particular, given a certain pixel image out of the consistency curve, the algorithm first calculates the local maximum value from all pictures consistency curves calculated from the reference image and each adjacent image. dk - depth value, maximum value Ck of the kth image corresponding numerical consistency, consistency function Robust Image:
*************************************** FUNC **********
W is is kernel function - such as a Gaussian function [190], as shown in Fig3.4 FIG effect, simple averaging method of selecting a wrong global maximum depth, image uniformity and robust successfully inhibited outliers. fig3.5 described how noise is suppressed point method. Another, more simple but effective method is negligible below a certain threshold score photoelectric consistency. After Goesele, Curless, but Seitz just calculated the average of the paired photographs consistency ignore values below a certain threshold score [80]. Such thresholding is a very sensitive operation, the result is heavily dependent on parameter selection, are generally known NCC illumination uniformity at different inputs, it is very strong and stable data. So often calculated using the NCC score threshold constant value. Similar photos coherence function processing point cloud reconstruction framework, details see Section 3.2.1.
Fig3.6 reconstruction result robustness by Goesele, Curless, and Seitz [80] The top line shows a reference example of the left image, and a confidence threshold depth right two different depth reconstructed FIG. When the depth estimation confidence estimate is below the threshold, the pixel is discarded. Threshold left depth map more stringent (higher), therefore, less noise but observed more holes. Note that the depth map is usually estimated by the depth converter which converts an image visualized as a valid image intensity values. However, in this figure, the depth map is visualized as a shadow polygon model, which is obtained by a simple fusion volume of a [52] (see Section 3.3). Image data set consisting of 24, 24 they reconstructed image depth maps, which are then combined into a polygon of the same fusion model, results are shown in the bottom line. A single depth map is very noisy, comprising many holes, but merging model becomes cleaner, and exhibit fewer reconstruction hole. Effect of the number of input image reconstruction quality is shown in Figure 3.7. Has more than 300 images, temples model becomes complete, and dinosaur models still have some holes even texture, the photo conformity assessment more challenging.

3.1.3 MRF depth map (MRF Depthmaps)

Although it is shown in the above a robust image coherence function, but in special cases the consistency curve peak image may not match the actual depth. As in the case of severe occlusion presence, in most cases there may not match the corresponding image. Standard solution of these problems are: mandatory use of spatial consistency in adjacent pixels with similar depth value assumptions, which Markov random field (MRF) was very successful for solving this task. FIG depth MRF formula [120] can be regarded as a combination optimization problem, wherein the input depth values is discretized into a set of limited depth range. The problem is then focused on the label to each pixel p a deep tags disposed KP , while minimizing the loss function

avatar

The first formula is a summation of all pixels in the image are summed, and the second is on all the adjacent pixels, the adjacent pixels can be denoted as N , the adjacent pixels can be divided into those 4- neighbor and neighbor 8- the former is in a horizontal, vertical adjacent pixels; which contains diagonally adjacent pixels. 4 neighborhood system with fewer and easier interaction term, but may be more discrete impact. Discussed next lower potentials unary Phi (.) And Interaction potentials pairwise PSI (.,.)

*** unary potentials ***
one yuan loss function label image consistency loss reaction, it is inversely proportional to the consistency of the image. Defining a function of the cell loss vary. If the range [-1,1] of NCC as a function of the consistency of image monohydric linear loss may be defined as shown below truncation loss function:

avatar

Where: [tau] U value cutoff threshold. Any robust function such as: or cauchy Huber loss function may be, as a function of cell loss.

*** Pairwise Interaction Potentials ***
paired loss term
pair loss term enforce spatial regularization and is proportional to the amount of difference in the depth of the adjacent pixel, so that adjacent pixels have similar depth values. Loss function defined pair will be different, but the following is given a simple implementation, a linear loss function as a cutoff to avoid penalizing depth discontinuities:

avatar

*** Optimization ***

While formula 3.2 is a problem but there are many NP approximate solution, especially when the pairwise term satisfies the formula [122].

avatar

For the modulo function times, one popular technique is known as alpha extended [122,45,44], which is repeatedly cut algorithm to solve maximum a minimum to improve the flow label category.
Fortunately, the sub-mode conditions that apply to many of the standard items in pairs. More specifically, as the distance metric Ψ (α, α) should be zero, since the two tags are the same. The remaining conditions for the triangle inequality:
avatar

Smooth prior usually defined as a distance metric, and satisfies the triangle inequality formula ^ 2. Metric example is linear, the linear cut or the Cauchy (Cauchy) loss function. However, secondary or Huber loss function is not a sub-mode function, because the secondary function does not obey the triangle inequality. Please note that the loss of different pairs, unary potential function is not limited and can be arbitrarily set. MRF can be used to solve many other computer vision problems, more details about the MRF will be described in detail in the following sections [114,179].

[2]: modular sub-optimization is a hot research topic in machine learning community, where the sub-module describes the mathematical properties of a series of functions. However, in computer vision, it is generally used to describe the multiple sub-modules labeled target function optimization problem. They are equivalent in mathematics, but in a very different way to deal with.

3.1.4 Multiple Hypothesis MRF Depthmaps

In the previous section Campbell, Vogiatzis, Hernández Cipolla and extensions to the standard for improving MRF results [48]. Rather than for the entire image without hesitation discrete depth value as a possible set of labels, their algorithms to extract an image from a local maximum consistency profile of each pixel, each pixel is then assigned to a formula such MRF local maximum the value of depth. Thus different pixels have different tag set. They also use "unknown (unknown)" label indicates in some cases circumstances depth value can not be estimated correctly. In this case, they recognize the value of this pixel depth is unknown, and therefore should not face any value. This means that the value returned is accurate depth, the estimated value of a high degree of certainty.
The process comprises two phases: 1) extracting depth label; 2) MRF optimization deep tags for dispensing the extracted assignment. We now discuss the details of the algorithm.

Depth Label Extraction

The first step is to obtain the pixel p in the reference image the I REF set of assumptions depth values. In the calculation of the I REF after image uniformity and image adjacent curves, the curve within the depth of the image, retaining the K highest scores from all curves curve peaks {di (p) | i∈ [ 1, K]}, image consistency function takes NCC algorithm. As described above, another key feature of this algorithm is to include an unknown state U, when there is not enough evidence when selected. Thus, for each pixel, a depth thereof label set {D { I (P)}, the U-}.

MRF Optimization

MRF is assigned as the deep tags optimization problem, in which each pixel has a maximum of (K + 1) th labels. If the number of peaks found in the extraction stage is less than the depth of the label, the label number K is less. Coherence function corresponding to the image correlation peak and the depth D I (P) and the score C (P, D I (P)). The final state is as described previously unknown state U.
one yuan loss function is very simple. We want to lower the local matching values increase the maximum penalty term, because they are more likely to lead to an incorrect match. They exponential function mapped to the inverse function of this fraction n loss function [190], and the constant penalty term [Phi] the U- is forced to an unknown state of the pixel, to avoid having a poor image uniformity and no adjacent pairwise term the pixel depth values assigned.

avatar

pairwise key forces the use of spatial regularization. There are two types of labels (Unkonwn depth value and status) and the pair loss is defined in the following 4 (= 2 × 2) Case:

avatar

The first case, two tags depth value, i.e., the amount of loss function only needs to calculate the difference (3.6). Note that the difference amount is normalized by the average of the depth values, so as to decrease with the associated scaling. The second and third cases in which the label is one of Unknown state, using a constant penalty term to prevent frequent switching deep tags and an unknown state. The last case is two labels are unknown state, in order to maintain spatial consistency penalty term is set to zero.
Unfortunately pairwise cost in this formula is not sub-modules, since each pixel in the depth independent labels are extracted, and the meaning of the i-th label are different. For example, [Psi] (D I (P), D I (Q)) in the standard MRF formula (3.6) 0 because D I (P) and D I (Q) is independent of pixels and corresponding to the same depth value . However, in the formula is not the case, and therefore alpha expansion is not applicable. However, message passing (messaging) algorithm, such as cyclic belief propagation (LBP) [204] and the tree-reweighted message passing (TRW) [194], which is MRF other popular optimization techniques, in particular, TRW has been successfully applied to solve many computer vision problems include depth map reconstruction [123,179], and for work.

Figure 3.8 illustrates the position of the boundary curves and image uniformity local maxima. Note that, at the boundary pixel distribution occluded unknown tag (from the top of the sixth pixel), wherein the depth of the label to assign the correct spatial regularization enforcement, even if the global maximum of the curve corresponding to the depth error (from the top The fourth pixels). Figure 3.9 shows the results of additional experiments, including intermediate results of evaluation for the reconstruction process. As shown, a single depth map Unknown (unknown state) label and a reference image ref invisible part is a hole. However, only the high confidence region for reconstructing the result is very important to minimize the noise may be present in the fusion step. Figure 3.9 illustrates the model only in two overlapping depth map becomes nearly complete.

avatar

3.1.5 More Depthmap Reconstruction Algorithms

In addition to the above the previous sections, a lot of depth map reconstruction algorithms have been proposed. This section describes some of the more important algorithms and technical literature

Real-Time Plane Sweeping Depthmap Reconstruction

FIG reconstructed depth calculation is not cheap operation, image coherence function needs to be evaluated at each pixel on each image and depth assumptions. However, Gallup, Frahm, Mordohai, Yang , and Pollefeys proof can be achieved through clever use of GPU real-time [76]. The algorithm is called "Plane Sweeping Stereo", as it scans a series of parallel planes in the scene, by projecting an image onto a planar homography, and then each evaluation image plane consistency.
Depth value for each pixel by the "winner takes all" strategy selection, the algorithm has two key features.

avatar

First, in the algorithm shown in FIG. 3.10 in the second row, in which a plurality of scanning directions, these directions are extracted from the scene, the scene structure along the scanning direction to be reconstructed. In order to evaluate the consistency of photographic images, most of the algorithm assumes that the image surface with respect ref parallel to each other. It corresponds to the scanning plane in a single fixed direction. , When the surface is not the scene along a planar direction, different images on the correlation window does not match the plane of the first row of Figure 3.10. On the other hand, when the scene at a plane parallel to the surface, identical to the 3D projected surface area of the relevant window, produce an accurate picture conformity assessment. Generating a depth map for each scan direction, the plurality of results to produce a final combined depth map (for more details see the article [76]). The strategy (for example, Manhattan) direction of the urban scene is particularly effective for a small number of dominant usually present, sparse 3D point cloud can be obtained from SfM system reconstruction.
The second key difference is the realization of efficient GPU image reprojection and photographs conformity assessment, real-time performance. More specifically, the texture re-projected onto the scanning plane homography follow, which is the standard rendering process can be effectively performed on the GPU. After collecting texture image re-projection of photographs conformity assessment can also be performed on the GPU each pixel. Photo consistency functions used in the original paper is squared gain correction and poor (SSD). Demonstration system was the same in each sequence camera substantially constant illumination conditions on the video sequence, which is one reason to use is still valid SSD more expensive features, such as the NCC is not important.

avatar

2.4GHzAMD using dual-core processors and NVidia GeForce 8800 at 30 Series GPU processing frames per second video stream 512 × 384. Each depth image of FIG. 7 and is calculated using the scanning plane 48, which requires only 24 milliseconds. After the reconstruction of the depth map for each frame, the system having a combined depth map to all the grid model. Specific details, see their paper [76], then we will introduce the popular Fusion in Section 3.3. Figure 3.11 shows the results of the above algorithm for reconstruction of several streets. The largest data set includes 170 000,
the system produces nearly 30 billion (≈512 × 384 × 170,000) a 3D point by a simple calculation
, calculation ignores redundancy and high vulnerability to rebuild on a scale far greater than the corresponding time of publication ( correlation) method (2007).

Second Order Smoothness

MRF successfully used in various depth map reconstruction algorithm, and many other computer vision tasks can be dispersed into a relatively small number of tags. Typical prior smoothness to act on a pixel, the difference in depth and attempts to minimize the two pixels. FIG frame reconstructed in the depth, the front plane parallel to the surface tends priori, that all surfaces have the same depth value. However, in the real scene, most of the images are not parallel to the surface of the ref, when the assumption does not hold when, prior knowledge may lead to reconstruction errors shown in Figure 3.12.

avatar

Woodford, Torr, Reid, and Fitzgibbon proposes a perspective algorithm, the method is still employed MRF, but introduces smooth prior second order in three groups (i.e., three pixels), the enforcement segment planar surface [196]. More specifically, given three adjacent pixels (p, q, r) a penalty term [D P + D R & lt - 2D Q ] as a smooth loss function, which is a finite approximation of the second derivative,
when the first the first derivative is constant loss function is 0, i.e., when the surface is a planar segment. the introduction of triple cliques of the optimization is complicated (at least not cost function Functions) require more complex optimization algorithm needs to resolve the problem. Optimization Algorithm details with reference to their paper [196]
second order effects and smooth prior real situation shown in Figure 3.12. At the top of the synthesis of example, most of the structure is planar segments, and reconstructed by standard methods (i.e., a priori order) are not effective to produce a segment parallel surfaces. Their algorithm successfully rebuilt most of the expected sub-plane. Examples of the base structure is in most real curved rather than planar segment. Nevertheless, their reconstruction results much more accurate than the standard method, since piecewise smooth prior plane more flexible, more closely fit the curved surface compared to prior parallel smooth. Standard methods have stepped housing artifacts in many places.

Guess you like

Origin www.cnblogs.com/baozhilin/p/11415733.html