[Intensive reading of the paper] Deep Marching Cubes: Learning Explicit Surface Representations

0. Summary

Existing learning-based 3D surface prediction solutions cannot be trained end-to-end because they operate on an intermediate representation (e.g., TSDF) from which the 3D surface network must be extracted in a post-processing step (e.g., by moving cube algorithm) grid. In this paper, we study the end-to-end 3D surface prediction problem. We first prove that the Marching Cubes algorithm is non-differentiable and propose an alternative differentiable formulation that is inserted into a 3D convolutional neural network as a final layer. We further propose a set of loss functions that allow supervised training of our model with sparse points. Our experiments show that this model can predict subvoxel-accurate 3D shapes of arbitrary topologies. Furthermore, it learned to complete shapes and separate the interior and exterior of an object even in the presence of sparse and incomplete ground truth. Our model is flexible and can be combined with a variety of shape encoders and shape inference techniques

main contribution

  • We prove that Marching Cubes are non-differentiable to topological changes and propose an improved representation that is differentiable.
  • We propose an end-to-end surface prediction model and derive a suitable geometric loss function. Our model can be trained from unstructured point clouds and does not require explicit surface GT.
  • We propose a new loss function that allows separating the interior of an object from its exterior when learning sparse unstructured 3D data.

1、Marching Cubes

Mainly two steps:

  1. Estimating topology (i.e. estimating the number and connectivity of triangles in each cell of the volume grid)
  2. Predict the vertex positions of triangles and determine geometric shapes.

1.1 Detailed algorithm process

  1. Assume that there is a signed distance field (sdf) of N^3. Each grid point records its signed distance d from the surface (for example, if the grid point is inside the object, d>0; outside, d<0). Each small cube (cell) has 8 corners. By iterating over all cells ( 'marching' ) and inserting triangular faces when a sign change is detected at adjacent grid points.
    Insert image description here
  2. Calculate the position x of vertex w (vertex) through linear interpolation. We assume:
    x=0, if w=v
    x=1, if w=v'
    d and d' represent the signed distance between grid points v and v', then Given the signed distance f(x)=d+x(d'-d) at the
    position
    Insert image description here

1.2 Unable to use marching cubes algorithm to build neural network

Given the MC algorithm, can we construct a deep neural network for end-to-end surface prediction? We can try to construct a deep neural network that predicts a signed distance field, which is converted to a triangular mesh using MC. We can then compare this surface to a GT surface or point cloud through MC layers and neural networks and back-propagate the error. The answer is no.

Reason: marching cubes algorithm is not differentiable

  1. When x=d/(d-d'), when d=d', the denominator will be 0, causing an exception.
  2. Secondly, the observation points only affect the grid cells in their vicinity, that is, they only act on the cells that the surface passes through. Therefore, the gradient does not propagate to cells further away from the prediction surface.

Solution:

Instead of letting the network predict the symbolic distance value, it predicts the probability of occupancy of each grid point (for example, the grid point is inside the object, o=1; outside, o=0) and calculates the vertex w (vertex) through linear interpolation
Insert image description here
. Position x, we assume:
x=0, if w=v
x=1, if w=v'
d and d' represent the signed distance at grid points v and v', then the signed distance f(x )=d+x(d'-d)
Let f(x)=0, that is, get the position of the vertex w of the surface x=d/(d-d')

2、Differentiable Marching Cubes

2.1 Differentiable Marching Cubes Layer (DMCL) mathematical definition

The On predicted by the network is the parameter of the Bernoulli distribution, and pn(t) represents the probability of voxel grid occupation. Note that t takes a discrete value of 0 or 1, while on takes a real number range of [0,1].
In fact,
pn(t)=on, if t=1
pn(t)=1-on, if t=0
Insert image description here

2.1.1 The probability that the topology structure of cubic grid n is T: determined by the occupancy probability of 8 grid points

Insert image description here
Insert image description here
Reference article: https://hideoninternet.github.io/2020/01/06/a18afe7a/

One good thing about the author is that it assumes that the probability value of the point conforms to the Bernoulli distribution, so that you can get all the probability values ​​of a cube for all topological structures. When calculating loss, all topological structures share the same vertex. displacement to get the error. This idea of ​​predicting a probability value as a weight for all situations is relatively common (for example: pixel2mesh++, CMR)

2.1.2 Determine the displacement of the vertex

The probabilities of all 256 topological structures are determined from 2.1.1. For example, the one with the highest probability is selected as the topological structure of the cubic cell. For example, as shown in the figure below, then we only need to determine the specific positions of the four vertices of this face. You can finally determine the shape of this surface. We let the network predict a tensor X∈[0, 1] NxNxNx3 , let xn∈[0, 1] 3 represent the n'th element of A 3D vector because we need to specify a vertex displacement for each dimension of 3D space.
Insert image description here

2.2 Network structure

  • Point feature extraction: A variant of PointNet++ is used. The fully connected layer extracts local features for each point.
  • Grid pooling: Group all points falling in the same voxel into one category and perform pooling (such as maximum pooling?)
  • Obtain a NxNxNx16 three-dimensional voxel lattice structure, using 3D-CNN
  • Use skip connections to maintain details
  • The decoding end is divided into two branches, one is used to predict the occupancy probability O, and the other is used to predict the displacement field X of the point.
  • A cubic lattice has 8 vertices, and there are 2^8=256 topological structures. The author only considered 140 single connected topologies.
    Insert image description here

2.3 Loss function

Insert image description here

1. Point to Mesh Loss

Insert image description here

2. Occupancy Loss

Insert image description here

3. Smoothness Loss

Insert image description here

4. Curvature Loss

Insert image description here

3. Experimental results

Insert image description here
Insert image description here

4 Conclusion

We propose a flexible framework for learning 3D mesh predictions. We demonstrate that training the surface prediction task end-to-end leads to more accurate and complete reconstructions . Furthermore, we show that surface-based supervision results in better predictions when the Ground Truth 3D model is incomplete . In future work, we plan to adapt our method to higher resolution outputs using octree technology and integrate our method with other input modalities.

Guess you like

Origin blog.csdn.net/weixin_43693967/article/details/127434329