3D reconstruction method SFM

This is the report content of the 3.23 group meeting

 

After reading some blogs about 3D reconstruction, I have an in-depth understanding of the difference between point clouds, voxels, and meshes.

Advanced understanding of 3D reconstruction---what are depth maps, meshes, voxels, and point clouds

https://www.cnblogs.com/lainey/p/8547056.html

 

There are also some preliminary understanding of 3D modeling :

 

Use a set of pictures to do the algorithm required for 3D reconstruction: SFM (Structure from motion), which is to infer 3D information from 2D images of time series.

Software that uses this method are: Pix4Dmapper, Autodesk 123D Catch, PhotoModeler, VisualSFM

 

Most of the data sources for 3D reconstruction are RGB images, or RGBD images with image depth information (taken with special equipment such as kinect).

 

SFM is the most classic 3D reconstruction scheme:

1. Feature extraction (a bunch of methods such as SIFT, SURF, FAST, etc.)

2. Registration (mainstream is RANSAC and its improved version

3. Global optimization bundle adjustment is used to estimate camera parameters

4. Data fusion

 

The SFM algorithm is an offline algorithm for 3D reconstruction based on various collected disordered pictures. Before performing the core algorithm structure-from-motion, some preparatory work is required to select suitable images.

 

 

The 3D reconstruction algorithm depends on what sensor is being tried. If it is a binocular camera, it is generally an algorithm of epipolar geometry and visual feature registration, and the optimization is bundle adjustment; if it is a monocular camera, PTAM, DTAM, , In recent years, SFM has become more popular. If it is an RGBD camera like Kinect, the better ones are Microsoft's KinectFusion, PCL's open source KinFu, and MIT's enhanced version of Kintinuous; if it is a laser, it is generally done by SLAM.

 

RGB-based monocular is mainly based on multiview geometry (SFM). The more classic ones are DTAM and Microsoft's monofusion. The disadvantage is that it cannot do reconstruction of slightly larger scenes and the reconstruction accuracy is not high.

 

Binocular is considered to be the reconstruction of rgbd camera. The principle that the depth map can be obtained at the bottom layer is structured light, binocular, laser, or tof (structured light method is suitable for indoor high-precision reconstruction, there are many commercial products, and the sfm method is better than the structured light method. More convenient, no need to calibrate the camera in advance, but the accuracy is poor, many drones use the sfm method to model large buildings)

 

 

The 3D reconstruction algorithm can be described as when given a set of photos of an object or scene, under some assumptions, such as object material, observation angle and lighting environment, etc., by estimating a most similar 3D shape to explain the set. Photo. A complete 3D reconstruction process usually includes the following steps:

1. Collect scene pictures

2. Calculate camera parameters for each image

3. Reconstruct the 3D shape of the scene and the corresponding camera parameters through the image group

4. Selective reconstruction of scene materials, etc.

 

 

The core step is the third step: 3D shape reconstruction algorithm

 

There are four main types of conventional 3D shape representation: depth map (depth), point cloud (point cloud), voxel (voxel), mesh (mesh)

 

In recent years, many methods based on deep learning have also appeared:

 

David Eigen NIPS2014:   Depth Map Prediction from a Single Image using a Multi-Scale Deep Network

 

Fayao Liu CVPR2015  Deep Convolutional Neural Fields for Depth Estimation from a Single Image

 

Both papers use the CNN network structure to predict the relationship between a single image and its corresponding depth map.

 

But the depth image is not enough to explain the information of reconstructing the original input, it can only be used as an auxiliary information for 3D scene understanding. So began to study using a set of two-dimensional maps to reconstruct 3D point cloud maps or voxel and mesh maps.

 

3D point cloud and mesh reconstruction based on deep learning is difficult to implement, because DL learning the complete architecture of an object requires the support of a large amount of data. Then the traditional 3D model is composed of vertices and trangulation mesh, so different data sizes cause training difficulties. Therefore, everyone will use the voxelization (Voxel) method to convert all CAD models into binary voxel mode (with a value of 1 and a vacancy of 0), which ensures that each model is the same size. A recent paper: Choy ECCV2016: 3D-R2N2:A Unified Approach for Single and Multi-view 3D Object Reconstruction

 

Using deep learning to map from 2D images to their corresponding 3D voxel models: first encode the original input image with a standard CNN structure, then decode it with Deconv, and finally reconstruct the output voxel with each unit of the 3D LSTM.

 

3D voxel is three-dimensional, and its resolution increases exponentially, so its calculation is relatively complicated. The current work mainly uses resolutions below 32*32*3 to prevent excessive memory usage. However, the resolution of the final reconstructed 3D model is not high. Therefore, the road to scientific research is long and difficult.

 

Mesh and point cloud are irregular geometric data forms, so it is not feasible to use CNN directly. However, you can consider converting the 3D mesh data into graphs, and then convolving the 2D parameters on the 3D surface. Specifically, Spatial construction (Geodesic CNN) and Spectral construction (Spectral CNN)

 

For point cloud-based methods, see Hao Su's CVPR2017 paper PointNet: Deep learning on Point Sets for 3D Classification and Segmentation and A Point Set Generation Network for 3D Object Reconstruction from a Single Image.

 

The methods based on mesh and point cloud are generally more mathematical, and the details are not very effective. However, voxel can be considered to improve the reconstruction accuracy.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=325267619&siteId=291194637
SFM