Recursive-NeRF

Recursive-NeRF: An Efficient and Dynamically Growing NeRF: Recursive NeRF: An Efficient and Dynamically Growing NeRF
paper assumes that complex regions of a scene should be represented by large neural networks, while small neural networks are able to encode simple regions, achieving a balance between efficiency and quality balance between.
Recursive NeRF is an embodiment of this idea, providing an efficient and adaptive rendering and training method for NeRF. **Recursive NeRF's core learns the uncertainty of the query coordinates, representing the quality of the predicted color and volume intensity at each level. **Only query coordinates with high uncertainty are forwarded to the next stage of a larger neural network with more powerful representation capabilities. The final rendered image is a composite of results from various stages of the neural network.

Some defects of NeRF:
cannot adapt to scenes of different complexity. Second, regardless of whether individual query samples are complex or simple, NeRF passes them through the entire neural network, which is overkill for regions that are empty or have simple geometry and texture. These limitations severely impact large-scale scene rendering.
Recursive-NeRF mainly makes the following improvements:
1. A recursive scene rendering method, once the output quality is good enough, early termination prevents further processing, achieving state-of-the-art new view synthesis results with greatly reduced computation.
2. A novel multi-stage dynamic growth method that divides uncertain queries in shallow layers of the network and continues to improve them in deep networks of different growth, making the method adaptable to scenarios with regions of different complexity.

1. Review

The paper first reviews NeRF related papers and introduces the basic principles. Here is an introduction to the relationship between parameters and scene complexity: more complex scenes need to be represented by more parameters, while simple scenes can be represented by a small number of parameters.
NeRF was tested on the Lego dataset for different numbers of network layers (2, 4, 6, 8), network widths (64, 128, 256) and image sizes (25, 50, 100, 200, 400, 800) [ 14] PSNR. Network capacity is positively correlated with the number of network layers and network width. As scenes become more complex, the gap between the expressive capabilities of different networks grows larger
insert image description here

2. Recursive neural radiation field

1. Starting from a small neural network, in addition to color and volume intensities, recurrent NeRF predicts an uncertainty indicating the quality of the current results.
2. Recursive NeRF then directly outputs the results of the query coordinates in the current level with low uncertainty, rather than forwarding them through the rest of the network.
3. Query coordinates with high uncertainty are forwarded to the next level in the form of clusters, represented as multiple neural networks with more powerful representation capabilities. The k-means algorithm is used to cluster the high-uncertainty points in the current stage, thus dividing the scene into multiple parts for finer-grained prediction.
4. When the uncertainty of all query coordinates is less than a user-specified threshold or reaches a certain maximum number of iterations, the training process is terminated.
In this way, RecursiveNeRF adaptively partitions work to decouple different parts of the underlying scene according to their complexity, helping to avoid unnecessary increases in network parameters. The picture below is the Pipeline
insert image description here

2.1 Recurrent Neural Fields

The recurrent neural field takes as input the output y pi of its parent branch and the viewing direction d, and predicts the color ci , the density σ i , the uncertainty δ i , and the latent vector y i :
insert image description here
F Φi denoting the ith sub-network. Φ1 is the root of the recurrent network; in this example, ypi is set to query coordinates (x, y, z) The subnetwork F Φi consists of three main components : MLP module, branch module and output module. The MLP module includes two or more linear layers to ensure that the MLP module performs sufficiently complex feature processing. The MLP module predicts the uncertainty δi of each query point , forwards the points with low uncertainty to the output module for output, and assigns the nodes with high uncertainty according to their distance to the ki cluster center of F Φi Assignment to different sub-networks (K-Means clustering method). The output module is responsible for decoding features into ci and σi

insert image description here

2.2 Early Termination

Each branch network queries the uncertainty of the coordinates and uses it to determine the exit of the branch network. Use the original NeRF loss to help predict uncertainty. NeRF uses the mean square error (MSE) between the rendered image and the real image as the loss of coarse and fine network training:
insert image description here
two regularization losses are introduced in the paper to effectively train the uncertainty. The uncertainty δ i is calculated using a linear layer following the output features y pi of F Φi . We use the squared error of a pixel to supervise δi , with the aim that if a pixel has a large error, the sample point that produces the uncertainty associated with the sample point should also have a large δi . Therefore, we penalize δi for less than the square error, and if it is greater than E® in max, we take the value of 0: E(r) is the square error of ray r, N is the number of sampling points of ray r, δr ,i is the ray rth The prediction uncertainty of i sampling points. To prevent δ i from exploding, another regularization loss is introduced: for each query point, δ i is encouraged to be as close to zero as possible so the overall uncertainty loss is α 1 = 1, α 2 = 0.01 Use regularization loss instead of direct The uncertainty is trained using the L1 loss, since predicting E® accurately is about as difficult as directly predicting the color of the neural network's query coordinates. In the network structure, it is difficult to have accurate E® for shallow networks. Therefore, for α 1 , α 2insert image description here


insert image description here

insert image description here

Using a regularization loss with an unbalanced value allows the network to use larger penalties for points with uncertainty less than the loss, while points with uncertainty greater than the loss will be penalized less. In this way, the network learns uncertainty to the upper bound of the complex loss function, and only truly certain points are terminated early.

Following the operation of multi-scale dense networks in the paper, each time, all query coordinates are output through all outlets; images with early termination are also output, and their losses are weighted with equal weights during training. Therefore, our total loss function is:
insert image description here
D is the current level, the MSE loss and the uncertainty loss of the i-th layer output image are summed, β 1 = 1, β 2 = 0.1

2.3 Dynamic Growth

This strategy clusters uncertain queries at the current stage and grows a deeper network based on the clustering results.
In the initial stage, the network contains only one subnetwork Φ 1 , which consists of two linear layers. After I 1 iterations of training the initial network , we sample some points in space and compute their uncertainties. Then, we cluster the points with uncertainty higher than ε, and the clustering results determine the growth of the network in the next stage. To ensure that the clustering is simple and controllable, we use the k-means algorithm, a k ∈ [2,4] network augmented with k branches according to the cluster centers; eg Φ 2 and Φ 3 . Downstream, query points are assigned to the branch with the closest cluster center.
When the scene becomes complex, NeRF has to deepen its network, while we can simply add more branches to achieve the same result. We split the growing network in two for two reasons. First, splitting the points will reduce the complexity of the network, otherwise, a deeper network needs to be used for all points. Second, each sub-network is only responsible for part of the scene independently, making it more efficient and adaptable.
Growth-based networks are trained over many iterations, then clustered and grown. This process can be repeated continuously until the uncertainty of most points is less than ε. In order to complete the training in a reasonable time, the specified recurrent NeRF grows by a factor of 3 in total. The value of k used for each growth step can be different, but by default k is set to 2 for each growth step.

During training, sample points can be dropped out at multiple stages, while at a particular stage during inference, points are dropped out only once. Reliable points found earlier are immediately dropped out and rendered. Which branch to use depends on the result of the clustering. Cluster the uncertain points in the current stage, and pass them to different sub-branches with the same structure in the next stage.
The output network structure insert image description here
alpha linear is responsible for decoding features into the density of query points. The way to overcome this problem is to initialize the alpha linear weights of the growing sub-network to the same weights as the parent network. This allows the density-generating network of the sub-network to inherit part of the density information of the parent node, avoiding this instability.

2.4 Recursive rendering

In the current view, all points whose uncertainty is lower than the threshold at the current stage can present a relatively blurred image. All points with uncertainties higher than the threshold enter the next stage of the network for further refinement, and other points with lower uncertainties can exit this stage. These points together with points from all previous stages render a clear image.
The image below renders an image using points done at different stages, which are merged to form the final image on the upper left. Uncertainty is implicitly visualized in the figure, where low uncertainty regions in early stages are mainly empty spaces with simple structures and surfaces. Different query
insert image description here
points are done early in different stages, and finally all points are aggregated to form the upper left The rendered image of the corner.

Guess you like

Origin blog.csdn.net/qq_44708206/article/details/128478205