3D reconstruction algorithm principle

3D reconstruction algorithm principle

Three-dimensional reconstruction (3D Reconstruction) technology has been a hot topic in computer graphics and computer vision. Early 3D reconstruction techniques typically a two-dimensional image as input, three-dimensional model reconstructed scene. However, limited by the input data, reconstructed three-dimensional model is usually incomplete and less realistic. With the emergence of a variety of consumer-oriented depth camera (depth camera), the depth camera based 3D scanning and reconstruction has been rapid development. Low to Microsoft's Kinect, Asus XTion and RealSense Intel as the representative of the depth camera cost, appropriate volume, easy to operate, and easy to researchers and development engineers. Three-dimensional reconstruction technology is Augmented Reality (Augmented Reality, referred to as AR) technology foundation, through three-dimensional model reconstruction after the scan can be applied directly to the AR or VR scene. This article briefly describes the basic principle and its application based on three-dimensional reconstruction of the depth camera.

background

  • 3D geometric model of growing demand: The Movie game virtual environments and other industries
  • VR & AR hot
  • Real estate demand in areas such as three-dimensional map
  • Ancient Chinese building three-dimensional digital protection
  • Three-dimensional digital city
  • Three-dimensional map
  • VR && AR games, movies, etc.
  • Medical industry: three-dimensional heart
  • The education sector and other

application

Methods Introduction

Conventional three-dimensional reconstruction mainly rely on expensive three-dimensional scanning device and the target to be scanned remains stable over time. In recent years, the development of large-scale computing capabilities of computer hardware, especially the GPU and the development of distributed computing, real-time and efficient solutions as possible. The current mainstream method is mainly divided into the following categories:

  • Based on three-dimensional reconstruction of the depth camera
  • Based on three-dimensional image reconstruction

Principle mainly in the following three ways: structured light, this technology is representative of the product generation Kinect, its sensor chip is used PrimeSense home. PrimeSense is now Apple's company.

  • TOF, time-of-flight, on behalf of the second generation of Kinect products is due to Microsoft's One word of love, its official name is Kinect One.
  • Binocular camera, on behalf of the product is Google Tango and Leap Motion, the former is equipped with four cameras, the latter two

Three-dimensional reconstruction algorithm is widely used in mobile phones and other mobile devices, a common algorithm SfM, REMODE SVO and so on.

 

  • 2.2  binocular / monocular vision and more


Two binocular vision correction image using the main camera around to get a match is found around the point of the picture, and then restore the three-dimensional information environment based on geometrical principles. However, this method is about the difficulty of matching camera pictures, matching inaccurate results will affect the final imaging algorithm. Multi monocular vision cameras use three or more to improve the matching accuracy shortcomings Clearly, more time consuming, more real-time difference.

 

 

 


Both methods can be theoretically more accurate depth information recovery, but in fact affected by the shooting conditions, accuracy is often not guaranteed. Common are SGM and SGBM algorithm, which automatically sets KITTI driving data, the top 50 algorithms almost half of all improvements to the SGM.
3 based on the RGB-D camera consumer
camera may be based on active, passive different principles, algorithms based on the advantage that these devices have more usefulness.
In recent years, there are many studies three-dimensional reconstruction based directly on consumer-level RGB-D camera, such as Microsoft's Kinect V1, V2 products, and achieved good results. The earliest, from the Imperial College of Newcombe et al proposed in 2011 Kinect Fusion opened a prelude to the real-time three-dimensional reconstruction of the RGB camera. Since then there have Dynamic Fusion and Bundle Fusion algorithms.
These methods each have their own advantages and disadvantages, also have respective applicable range of applications. The above is the desired entry field of three-dimensional reconstruction based on deep learning the students a brief introduction of these methods, such as the need to understand, please read the relevant literature, SfM and multi-view geometric classics such as three-dimensional field reconstruction algorithm carefully introductory foundation never obsolete.

 

 

 

 


 

Three-dimensional reconstruction algorithm based on the depth of learning


We will briefly divided into three parts, three-dimensional reconstruction algorithm based on deep learning, more detailed literature review will be described in subsequent articles in this series:

 

  • Depth learning method is introduced to improve in a conventional three-dimensional reconstruction algorithm
  • Deep Learning reconstruction algorithm and the traditional three-dimensional reconstruction algorithm integration, complementary advantages
  • Imitate animal vision, the direct use of deep learning algorithm for three-dimensional reconstruction

 

1 is introduced to improve the depth of the conventional learning method three-dimensional reconstruction algorithm


Because CNN has a huge advantage in image feature matching, so there is a lot of research in this area, such as:

 

  • DeepVO


Extrapolate from a series of original RGB image (video) based on the depth of the recursive convolutional neural network (RCNN) directly gesture, without employing any conventional visual odometry module in improved three-dimensional reconstruction of the ring Visual odometry .

 

  • BA-Net


SfM algorithm which a bundle adjustment ring (Bundle Adjustment, BA) as a layer optimization algorithm of the neural network, trained to better basis function generation network, to simplify the reconstruction of the rear end of the optimization process. • Code SLAM, extracted by the neural network a plurality of basis functions to represent the depth of the scene, these groups function can be simplified geometry optimization problem of the conventional method.

 

2.  Deep Learning reconstruction algorithm and the traditional three-dimensional reconstruction algorithm integration, complementary advantages


CNN-SLAM13 results CNN predicted dense depth map and monocular SLAM are fused monocular SLAM close failure image position as low texture areas, fusion programs given more weight than the depth scheme to improve the effect of the reconstruction.

 

3.  mimic animal sight, the direct use of deep learning algorithm for three-dimensional reconstruction

 

Three-dimensional reconstruction of the main areas of data format, there are four:

 

  • FIG depth (depth map)

 

2D image, each pixel record to grayscale represents distance to the object from the viewpoint, the closer the darker;

 

  • Voxels (a voxel)

 

Volume pixel concept, similar to the 2D pixel definition;

 

  • Point cloud (point cloud)

 

Each contains a three-dimensional coordinate point, the color and the reflected intensity information;

 

  • Mesh (mesh)

 

I.e., polygon mesh, easily calculated.

 

According to the data processing form differ brief study will be divided into three parts: 1 ) based on the voxel; 2) based on the point cloud; 3) based on a grid . And three-dimensional reconstruction algorithm based on the depth map is not yet, because it is used more in the 2D image visualized specific information instead of processing three-dimensional data.

 

(1) based on voxels


Voxel, as the simplest form, a simple 2D convolution by reconstruction extended to 3D:

 

  • Depth Map Prediction from a Single Image using a Multi-Scale Deep Network, 2014 


The learning method is used to make the depth of the three-dimensional reconstruction for the mountains, voxel-based form, directly with a single image using a neural network directly depth map recovery method, the network is divided into global and local rough estimate accurate estimate, and using a scale not loss of function variants of regression.

 

  • 3D-R2N2: A unified approach for single and multi-view 3d object reconstruction, 2016


Christopher et al., 3D-R2N2 proposed model voxels form established based 2D graphics mapped to 3D voxel model using a network structure Encoder-3DLSTM-Decoder, completing / multi-view 3D reconstruction based on single-view voxel (multiview It will be used as input to a serial input LSTM and outputs the plurality of result).
But there is a problem such voxel-based approach, i.e., to enhance the accuracy of the need to improve resolution, increase the resolution will increase substantially time consuming calculations (3D convolution computation, the vertical power).

 

How to estimate the camera position in a different frame ?

 

 

 

The new process flow of a data

 

 

 

(2) based on cloud


In contrast, the point cloud is a more simple, uniform structure, easier to learn, and easier to operate in the point cloud geometric transformation and deformation, need not be updated because of their connectivity. But note that the lack of points in the point cloud connectivity, and thus lack the surface of the object information, and intuitive feeling that the reconstructed surface is not flat.

 

  • A Point Set Generation Network for 3D Object Reconstruction From a Single Image, 2017


The method is to do a pioneer for three-dimensional reconstruction of a point cloud, the maximum contribution is to address the problem of loss of training point cloud network time, since the same geometry could be represented by a different point cloud in the same approximate extent, How proper loss function to measure the depth of problem-based learning has been a three-dimensional reconstruction method using point cloud.

 

  • Point-Based Multi-View Stereo Network, 2019


This method is processed by a point cloud of a scene, a two-dimensional and three-dimensional depth fusion texture information, to improve the accuracy of the reconstruction point cloud.

 

(3) Based on Grid


The shortcomings of previous methods:

 

  • Voxel-based, computationally intensive, and difficult to balance the resolution and accuracy
  • Based on the point cloud, the cloud point of the lack of connection between the point of the surface is not smooth reconstruction


In contrast, the grid representation having a light weight, rich in detail the characteristics of shape, there is an important relationship between the adjacent connection points. The researchers therefore do grid-based three-dimensional reconstruction. The grid is described by the vertices, edges, faces of the 3D object, which corresponds exactly to FIG M convolutional neural networks = (V, E, F) corresponding to.

 

  • Pixel2Mesh

Do single RGB image reconstruction with a triangular mesh, the corresponding algorithm process is as follows:

Step1: For any input image are initialized to an initial three-dimensional shape ellipsoid.
Step2: The network is divided into two parts: a full convolution neural network by extracting features of the input image, another portion of the three-dimensional network structure represented by FIG convolutional network,
Step3: three-dimensional network of continuous modification, the final output object shape.
Loss of function in four models to constrain the shape, and achieved good results. Contribution is achieved to generate three-dimensional information of the object represented by a grid directly from the end view of a single color of the neural network.

 

to sum up


The traditional three-dimensional reconstruction algorithm can be divided into:

 

 


These methods each have their own advantages and scope, a brief recap:

 

 


Research on the three-dimensional reconstruction algorithm, there are three major deep learning:
1. Depth learning method is introduced in the conventional three-dimensional reconstruction algorithm to improve;

 

2. Deep Learning reconstruction algorithm and the traditional three-dimensional reconstruction algorithm integration, complementary advantages;

3. visual imitation animal, the direct use of three-dimensional reconstruction algorithm to the depth of learning, including a voxel-based, based on cloud-based grid.

 

Guess you like

Origin www.cnblogs.com/wujianming-110117/p/12515268.html