Design of 3D Model Reconstruction System Based on Stereo Vision

        The three-dimensional reconstruction technology based on computer vision refers to the use of two or more two-dimensional images to restore the geometric information of the surface of a space object. The process is the inverse process of the imaging process [1]. In the early stage of the development of 3D reconstruction, due to the limitations of computational processing power and theoretical research level, the modeling of real objects requires expensive special equipment such as visual coordinate measuring machines to achieve, and is limited by the principle of gray-scale matching, a single camera The two images taken cannot have too much translation and rotation, and the movement of the camera is strictly limited. The research purpose of this subject is to solve this problem. Use the most common digital devices (such as handheld digital cameras) to obtain object sequence images from multiple perspectives, and then process them by ordinary computing units (such as PCs) to determine the corresponding points of the multiple views. The geometric constraint relationship with the multiple views, the coordinates of the feature points in the three-dimensional space and the coordinates of the camera in the three-dimensional space corresponding to each image are restored to obtain a three-dimensional model.

      Aiming at the problem that the 3D model reconstruction under non-specific environment is interfered by the background and cannot reach the accuracy, a Grabcut-based image segmentation method is proposed. This method is based on the principle of minimizing the energy of the graph cut theory, which can better separate the foreground and background and remove The redundant information of the image. And this article uses three views to reconstruct spatial points, which is more accurate and stable than the traditional reconstruction algorithm that only uses two images.

    1. Image preprocessing

The image acquisition is based on the reconstruction of the stereo vision three-dimensional model. After the image is acquired through an image sensor (such as a CCD camera), taking into account the effects of lighting conditions, camera performance, and viewpoint differences, we need to further process the acquired image.

The purpose of image preprocessing is to improve the clarity of the image, improve the visual effect of the image, and convert the image into a form that is more suitable for human or machine analysis and processing [2]. It includes two parts: 1) image smoothing, filtering and enhancement 2) separation of background and target objects.

In the application of machine vision, in order to obtain the corresponding relationship between the three-dimensional object point and the image point in the world coordinate system from the image, and calculate the position and shape of the object, it is necessary to establish a geometric model of the camera imaging and obtain its parameters. Under most conditions, these parameters must be obtained through experiments and calculations. This process of solving the parameters is called camera calibration [3]. Camera calibration determines the internal geometric and optical characteristics of the camera (internal parameters), the three-dimensional position and direction of the camera coordinate system relative to the world coordinate system (as shown in the figure) (external parameters). The accuracy of the calibration determines the accuracy of the model reconstruction, so the camera calibration The algorithm used is critical.

:

1.3 Feature matching

Matching is a core problem in 3D model reconstruction. Find the corresponding feature points in two or more images, minimize the mean square error between the images, and get a more accurate relative position relationship, so as to compare the two images Make a match. The existing feature point extraction methods can be divided into three types: template-based feature point detection, edge-based feature detection and brightness transformation-based [4]. The SIFT feature matching algorithm in this paper is a stable local feature matching algorithm [5]. The algorithm has strong matching ability, excellent scale and rotation invariance, and robustness to illumination and viewing angle changes. It is widely used in robot vision, three-dimensional target reconstruction and medical image registration.

    1. Three-dimensional reconstruction

After the feature points are matched and the internal and external parameters of the camera are obtained, a certain pixel point in the pixel coordinate system can be reversely mapped to a point in the world coordinate system. Through the matching relationship of feature points in the image, a set of equations is established to solve the coordinate value of each feature point in the world coordinate system, and then each feature point is connected into a corresponding three-dimensional map according to the corresponding relationship to complete the three-dimensional reconstruction of the object [6] .

2 system design

2.1 Image preprocessing

Aiming at the problem of image quality degradation, noise filtering and image enhancement based on simplified pulse coupled neural network model are adopted. In the image segmentation discussed in this section, Grabcut-based image segmentation method is used.

Grabcut is an interactive image segmentation method that replaces the construction of three-part graphs with simple "hard segmentation", which reduces the workload of the manual interaction part. It is an improvement of the Graph cuts algorithm by Rother et al. [7]. The method has the following three improvements: 1) The Gaussian Mix-ture Model (GMM) is used to replace the histogram to complete the target extraction of the color image. 2) Multiple iterations to estimate GMM parameters replace one minimization estimation to complete the process of energy minimization. 3) The requirement for interactive work is reduced through incomplete marking.

Each GMM in the Grabcut target extraction algorithm can be regarded as a K-dimensional covariance. In the optimization process, the vector k is introduced as an independent GMM parameter for each pixel, and the corresponding pixel opacity a=0 or a=1, Use q to represent the probability model of the target/background color distribution, so the image segmentation problem is transformed into

The main steps of Grabcut are as follows:

Initialization: 1) Manually set the background TB to initialize the three-part graph. The foreground area TF is empty, and the unknown area TU is the complement of the background TB. 2) Set the .a value of the background area pixel to 0, and the unknown area pixel a value to 1. 3) Use two sets of α=0 and α=1 to initialize the Gaussian mixture model of the foreground and background.

Iterative minimization: 1) Find the GMM parameter corresponding to each pixel n in the unknown area. 2) Cut the Gaussian mixture model parameters q , from the data of each pixel . 3) Use the minimum energy to get the initial segmentation. 4) Repeat step 1) until convergence.

User interaction editing: 1) Manually formulate the a value of some pixels in the image, set it to 0 or 1, update the three-part graph, and perform the third step in iterative minimization. 2) Optimization: Re-execute the steps of minimizing the entire iteration.

If the initial information given by the user does not get a very satisfactory segmentation effect, then further interaction, using more information to minimize the energy again, until the satisfactory segmentation result.

Feature matching under wide baseline conditions is to extract and describe stable features, so as to achieve feature matching between two images with large differences.

The algorithm for generating SIFT feature vectors in the image includes the following four steps: 1) Extremum detection in scale space, determining the location and scale of key points. 2) Accurately determine the location and scale of key points through the fitting function, and remove low-contrast points. 3) Specify the direction parameter for each key point through the gradient direction distribution characteristic. 4) Generate SIFT feature vector.

In the left part of the figure, take the 8×8 window with the key point as the center, and calculate the gradient direction histogram in 8 directions on each 4×4 small block to form a seed point. The key points in the right part are composed of 4 seed points of 2×2, and each seed point has 8 direction vector information. In the actual calculation process, in order to enhance the robustness of matching, a total of 16 4×4 are used for each key point. Seed points are described, so that 128 data can be generated for a key point, that is, a 128-dimensional SIFT feature vector is finally formed.

 The figure shows the general process of feature matching. After the SIFT descriptor is generated, Euclidean distance is used as the similarity measure.

The smaller the D, the closer the corresponding distance of the feature point, the higher the degree of similarity. In the matching process, the points to be matched are searched for feature points in image 2 using the nearest domain method, and the two closest feature points are found. If the ratio of the closest distance to the second short distance is less than a certain threshold, the pair of matching points is accepted. Based on the initial matching results obtained, RANSAC (random sampling consensus algorithm) is used to eliminate mismatches.

2.4 Three-dimensional reconstruction

This article uses a digital camera to take 3 photos of the same object from different angles, and ensure that the same feature point appears in the 3 photos. Then the trifocal tensor can be obtained through the corresponding calculation of the images of the three views. Similar to the basic matrix of the two-view geometry, it is only related to the camera parameters. The camera matrix can be restored by the trifocal tensor under the projective transformation that differs by a three-dimensional space. At the same time determine the basic matrix between the images.

The specific process is as follows: 1) Estimate the geometry of the two views from the sequence view, and then connect the matching sets of the two views to calculate a set of feature point correspondences across the three views. 2) Calculate the trifocal tensor from at least 7 sets of non-degenerate correspondences. 3) Restore the basic matrix from the three-focus tensor. 4) After determining the matrices of the two cameras, restore the third camera matrix with a difference of one projective matrix. 5) Feasibility error handling.

    After obtaining the three-dimensional coordinates, the surface of the object must be visualized. In this paper, the Delaunay triangulation method is used to reconstruct the three-dimensional surface of the object. Finally, texture mapping is completed through OpenGL.

Guess you like

Origin blog.csdn.net/ccsss22/article/details/108760942