0. Summary
Existing learning-based 3D surface prediction solutions cannot be trained end-to-end because they operate on an intermediate representation (e.g., TSDF) from which the 3D surface network must be extracted in a post-processing step (e.g., by moving cube algorithm) grid. In this paper, we study the end-to-end 3D surface prediction problem. We first prove that the Marching Cubes algorithm is non-differentiable and propose an alternative differentiable formulation that is inserted into a 3D convolutional neural network as a final layer. We further propose a set of loss functions that allow supervised training of our model with sparse points. Our experiments show that this model can predict subvoxel-accurate 3D shapes of arbitrary topologies. Furthermore, it learned to complete shapes and separate the interior and exterior of an object even in the presence of sparse and incomplete ground truth. Our model is flexible and can be combined with a variety of shape encoders and shape inference techniques
main contribution
- We prove that Marching Cubes are non-differentiable to topological changes and propose an improved representation that is differentiable.
- We propose an end-to-end surface prediction model and derive a suitable geometric loss function. Our model can be trained from unstructured point clouds and does not require explicit surface GT.
- We propose a new loss function that allows separating the interior of an object from its exterior when learning sparse unstructured 3D data.
1、Marching Cubes
Mainly two steps:
- Estimating topology (i.e. estimating the number and connectivity of triangles in each cell of the volume grid)
- Predict the vertex positions of triangles and determine geometric shapes.
1.1 Detailed algorithm process
- Assume that there is a signed distance field (sdf) of N^3. Each grid point records its signed distance d from the surface (for example, if the grid point is inside the object, d>0; outside, d<0). Each small cube (cell) has 8 corners. By iterating over all cells ( 'marching' ) and inserting triangular faces when a sign change is detected at adjacent grid points.
- Calculate the position x of vertex w (vertex) through linear interpolation. We assume:
x=0, if w=v
x=1, if w=v'
d and d' represent the signed distance between grid points v and v', then Given the signed distance f(x)=d+x(d'-d) at the
position
1.2 Unable to use marching cubes algorithm to build neural network
Given the MC algorithm, can we construct a deep neural network for end-to-end surface prediction? We can try to construct a deep neural network that predicts a signed distance field, which is converted to a triangular mesh using MC. We can then compare this surface to a GT surface or point cloud through MC layers and neural networks and back-propagate the error. The answer is no.
Reason: marching cubes algorithm is not differentiable
- When x=d/(d-d'), when d=d', the denominator will be 0, causing an exception.
- Secondly, the observation points only affect the grid cells in their vicinity, that is, they only act on the cells that the surface passes through. Therefore, the gradient does not propagate to cells further away from the prediction surface.
Solution:
Instead of letting the network predict the symbolic distance value, it predicts the probability of occupancy of each grid point (for example, the grid point is inside the object, o=1; outside, o=0) and calculates the vertex w (vertex) through linear interpolation
. Position x, we assume:
x=0, if w=v
x=1, if w=v'
d and d' represent the signed distance at grid points v and v', then the signed distance f(x )=d+x(d'-d)
Let f(x)=0, that is, get the position of the vertex w of the surface x=d/(d-d')
2、Differentiable Marching Cubes
2.1 Differentiable Marching Cubes Layer (DMCL) mathematical definition
The On predicted by the network is the parameter of the Bernoulli distribution, and pn(t) represents the probability of voxel grid occupation. Note that t takes a discrete value of 0 or 1, while on takes a real number range of [0,1].
In fact,
pn(t)=on, if t=1
pn(t)=1-on, if t=0
2.1.1 The probability that the topology structure of cubic grid n is T: determined by the occupancy probability of 8 grid points
Reference article: https://hideoninternet.github.io/2020/01/06/a18afe7a/
One good thing about the author is that it assumes that the probability value of the point conforms to the Bernoulli distribution, so that you can get all the probability values of a cube for all topological structures. When calculating loss, all topological structures share the same vertex. displacement to get the error. This idea of predicting a probability value as a weight for all situations is relatively common (for example: pixel2mesh++, CMR)
2.1.2 Determine the displacement of the vertex
The probabilities of all 256 topological structures are determined from 2.1.1. For example, the one with the highest probability is selected as the topological structure of the cubic cell. For example, as shown in the figure below, then we only need to determine the specific positions of the four vertices of this face. You can finally determine the shape of this surface. We let the network predict a tensor X∈[0, 1] NxNxNx3 , let xn∈[0, 1] 3 represent the n'th element of A 3D vector because we need to specify a vertex displacement for each dimension of 3D space.
2.2 Network structure
- Point feature extraction: A variant of PointNet++ is used. The fully connected layer extracts local features for each point.
- Grid pooling: Group all points falling in the same voxel into one category and perform pooling (such as maximum pooling?)
- Obtain a NxNxNx16 three-dimensional voxel lattice structure, using 3D-CNN
- Use skip connections to maintain details
- The decoding end is divided into two branches, one is used to predict the occupancy probability O, and the other is used to predict the displacement field X of the point.
- A cubic lattice has 8 vertices, and there are 2^8=256 topological structures. The author only considered 140 single connected topologies.
2.3 Loss function
1. Point to Mesh Loss
2. Occupancy Loss
3. Smoothness Loss
4. Curvature Loss
3. Experimental results
4 Conclusion
We propose a flexible framework for learning 3D mesh predictions. We demonstrate that training the surface prediction task end-to-end leads to more accurate and complete reconstructions . Furthermore, we show that surface-based supervision results in better predictions when the Ground Truth 3D model is incomplete . In future work, we plan to adapt our method to higher resolution outputs using octree technology and integrate our method with other input modalities.