ICLR 2023 Spotlight | Voxel Neural Surface Reconstruction with 20x Training Speed

This article was first published on CVHub. For whitelisted accounts, please consciously embed the business card of this official account and indicate the source. For non-whitelisted accounts, please apply for permission first, and offenders will be investigated.

Title: Voxurf: Voxel-based Efficient and Accurate Neural Surface Reconstruction

Paper: https://arxiv.org/abs/2208.12697

Code: https://github.com/wutong16/Voxurf

guide

Neural surface reconstruction aims to reconstruct accurate 3D surfaces based on multi-view images. Previous methods based on neural voxel rendering mainly use multi-layer perceptron (MLP) to train fully implicit models, which usually require hours of training to process a scene. Recent research work accelerates the optimization process by learning a learnable voxel grid to preserve important information. However, existing voxel-based methods often encounter difficulties in reconstructing fine-grained geometry, even when combined with SDF-based voxel rendering schemes. The paper found that this is because: 1) voxel grids often destroy the color-geometry dependencies that help learn fine geometry structures ; 2) incompletely constrained voxel grids lack spatial coherence and are prone to falling into local minima . In this work, the paper proposes Voxurf, an efficient and accurate voxel-based surface reconstruction method . Voxurf solves the above problems through several key designs, including: 1) using a two-stage training process to first obtain a consistent rough shape, and then gradually restore details; 2) using a dual color network to maintain color-geometry dependencies; 3) using Hierarchical geometric features to facilitate the propagation of information between voxels. A large number of experiments have proved that Voxurf achieves high efficiency and high quality at the same time. On the DTU benchmark, Voxurf achieves a 20x improvement in training speed and achieves higher reconstruction quality compared to previous fully implicit methods .

contribute

  1. Compared with the state-of-the-art method, the paper method achieves about 20 times training acceleration , reducing the training time on a single Nvidia A100 GPU from more than 5 hours to 15 minutes.
  2. Our method excels in terms of surface reconstruction accuracy and novel view synthesis quality , and can better represent details than previous methods, both in surface restoration and image rendering.
  3. The dissertation research provides in-depth observation and analysis of the architectural design of the explicit voxel representation framework, providing valuable observation and analysis results.

method

The paper conducts experiments exploring different variants of the baseline model to identify key factors in the architecture design. The paper adopts a shallow MLP as the color network, and considers local features and normal vectors as input. Experimental results show that local features can improve the coherence and representation ability of surface reconstruction, while maintaining color-geometry dependence can improve the accuracy of geometric details . Therefore, the paper proposes several key designs:

  1. Using two-stage training, first obtain a coherent rough shape, and then gradually restore the details;
  2. Introducing dual color networks, maintaining color-geometry dependencies, recovering accurate surface and new view images;
  3. Design hierarchical geometric features to promote information propagation between voxels and achieve stable optimization; 4) Introduce smoothness priors, including gradient smoothing loss, to improve visual quality.

Coarse Shape Initialization

First initialize the SDF voxel grid V ( sdf ) V^{(sdf)} with an ellipsoidal zero level setV( s df ) , place it in a predetermined region for reconstruction, and then train a shallow MLP with the normal vectornnn and local featuresfff as input, also includes embedding positionppp and viewing directionvvv . To ensure a stable training process and a smooth surface, the paper proposes to interpolate on a smooth voxel grid instead of using the originalV ( sdf ) V^{(sdf)}V( s df ) data, the paper uses 3D convolution and Gaussian kernel for smoothing, and reconstructs and calculates loss by querying the smoothed SDF value.

Fine Geometry Optimization

Dual color network

As shown in Figure 3, the paper designs a dual color network that utilizes local features interpolated from a learnable feature voxel grid while maintaining the dependency between color and geometry. We use two shallow MLPs for training, one receiving Hierarchical geometry features as input, and the other receiving simple geometric features (such as surface normals) and local features as input. The two networks are combined in a residual fashion, with a reconstruction loss that supervises the consistency of their output with real images:

Hierarchical geometry feature

In order to expand the perception range and facilitate information dissemination, we observe larger regions of the SDF field and use the corresponding SDF values ​​and gradients as auxiliary conditions for the color network. We define the neighbors of each location with a half-voxel size as the step size , and then connect the neighbors of different levels together to form a hierarchical structure. This can capture more local information in the color network and facilitate the transfer of information between voxels:

Among them, dlx d_{l}^{x}dlxmeans from V sdf V_{\text{sdf}}VsdfThe position of the query in pl − x p_{l-}^{x}plxand pl + x p_{l+}^{x}pl+xSDF value at . In addition, the paper also includes the gradient information δ xl = ( dxl + − dxl − ) / ( 2 ∗ l ∗ vs ) \delta_{x}^{l}=\left(d_{x}^{l+}-d_{x }^{l-}\right) /\left(2 * l * v_{s}\right)dxl=(dxl+dxl)/(2lvs) into the geometric feature,[ δ xl , δ yl , δ zl ] \left[\delta_{x}^{l}, \delta_{y}^{l}, \delta_{z}^{l} \right][ dxl,dyl,dzl] normalized to l2-norm is 1, recorded asnl ∈ R 3 n^l∈\mathbb{R}^3nlR3 . A hierarchical version of the normal state is expressed as:

Finally, for a predefined level l ∈ [ 0.5 , 1.0 , 1.5 , . . . ] l \in [0.5, 1.0, 1.5, ...]l[0.5,1.0,1.5,... ] , clickppThe hierarchical geometry of p is characterized by combining the above information in the following way:

As shown in Figure 3, fpgeo ( l ) f_p^{geo}(l)fpgeo( l ) is input to the MLPggeo g_{geo}ggeo, to assist geometry learning.

smoothness proirs

The paper employs two effective regularization terms to promote surface smoothing during training.

  1. First, the paper uses the Total Variation (TV) regularization term:

  1. The paper also assumes that the surface is smooth in the local area, and introduces a smooth regularization term, expressed as:

The total loss is:

experiment

As shown in Table 1, in the surface reconstruction experiments on the DTU dataset, the quantitative results show that the method of the paper has a lower Chamfer distance under the same settings. In the qualitative comparison of Figure 4 and Figure 5, the method of the paper can accurately and continuously recover the surface compared with NeuS, and shows advantages in recovering fine geometric details. NeuS, as a fully implicit model, has intrinsic continuity and local smoothness, but sometimes over-smooths to recover details.

The paper's method is extensively evaluated in Table 2, including surface reconstruction, novel view synthesis, and training time. In all indicators, the paper's method is significantly better than DVGO and NeuS. At the same time, compared with NeuS, the paper's method achieves about 20 times speedup in generating high-quality surface reconstructions.

Summarize

This paper presents Voxurf, an efficient and accurate voxel-based method for neural surface reconstruction. It includes several key designs: a two-stage framework gradually acquires a coherent coarse shape and restores details; a dual color network helps preserve color-geometry dependencies, while hierarchical geometric features facilitate the information propagation between voxels; Efficient smoothing priors include a gradient smoothing loss, further improving visual quality. Extensive experiments prove that Voxurf achieves a high level of performance in terms of efficiency and quality.

Guess you like

Origin blog.csdn.net/CVHub/article/details/131625377