Accelerate Nerf training: nerfacc

0 Preface

1.NerfAcc acceleration principle

1.1.Pruning Empty and Occluded Regions**

Insert image description here

In Nerf training, we often need to first define a rays_o (the origin of the ray) and rays_d (the direction of the ray), and then obtain different points on this ray by taking different t values ​​(predefined or other methods) :

pts = rays_o + rays_d * t # we may probably have 1024 t
  • However, in actual situations, the space represented by many points on this ray may be empty, and its corresponding density is 0. These points have no effect on the rendering of this ray at all. Assuming that we can skip these empty areas during training, we can reduce the points sampled on each line, which theoretically speeds up training. In order to realize this idea, in nerfacc, we can divide region of interestit into grids ( occupancy grid), and each grid stores this gridinformation occupied or empty. In this way, during training, assuming that the current sampling point falls emptywithin grid, we can ignore this sampling point and speed up the training of the model. (For specific implementation, please see the Occupancy Grid in the second half of the article)

  • In addition, for a ray, we assume that it hits an object (such as a wall), then theoretically, we do not need to pay attention to the point behind the wall, because the color we expect for the ray It is the color of the wall. We do not know the information behind the wall and it is not important. Of course, we can also omit the point behind the wall. The way to implement this idea is simple, nerfacc allows us to set a TTThreshold of T value (Transmittance), such as T < 1 ∗ e − 4 T <1*e^{−4}T<1e−4
    . _ During the projection process, we will calculate the density of each point to calculate its correspondingTTT value, assumingthe TTThe T value is less than the threshold, which means that the ray has encountered obstruction (or hit something), and other points after this point can be omitted.

Therefore, in general, nerfacc algorithm uses skipping empty areas and early termination of rays in occluded areas to speed up model training by reducing the number of points on each line.

1.2.GPU level

The main topic here is how to efficiently implement the above algorithm at the hardware level. Readers can read it carefully in the paper. I will not go into details here (the author is a novice in hardware).

1.3. Scene Contraction for Unbounded Scene**

An obvious problem with the Occupancy Grid mentioned above is that when the scene becomes larger and larger, the number of grids required will also skyrocket, which will put a huge burden on the memory. So nerfacc adopts the idea of ​​Mip-Nerf 360. When querying the occupation grid, it maps the unbounded large scene to a limited grid through a non-linear function, so that the occupation grid can also be used for large scenes. Accelerate training.

1.4.Differential Rendering

GPU optimized, faster rendering method.

2.NerfAcc common API

Let’s talk about some of the more important classes and functions in the nerfacc library. (The following code refers to instant-nsr-pl )

2.1.Occupancy Grid

This class is the occupancy grid we mentioned above, used to skip the empty area . Let’s first take a look at how it is generally defined:

from nerfacc import ContractionType, OccupancyGrid
# define the bounding box for Region of Interest
self.scene_aabb = torch.as_tensor(
                [-self.config.radius, -self.config.radius, -self.config.radius, self.config.radius, self.config.radius,
                 self.config.radius], dtype=torch.float32) 
# define the contraction_type for scene contraction
self.contraction_type = ContractionType.AABB # or ContractionType.UN_BOUNDED_SPHERE, ContractionType.UN_BOUNDED_TANH
# create Occupancy Grid
 self.occupancy_grid = OccupancyGrid(
                roi_aabb=self.scene_aabb,
                resolution=256, # if res is different along different axis, use [256,128,64]
                contraction_type=self.contraction_type)

Divided into three steps:

  • 1) Define a bounding box as your region of interest (region of interest)
  • 2) Define the resolution on each axis of the occupation grid
  • 3) Define a contraction type, which is a function that maps 3D space to grid space. For specific details, please see the official documentation of nerfacc. The commonly used one is ContractionType.AABB, which linearly maps our region of interest to a side length of 1 cube

At this point, we have defined our Occupancy Gridbasic characteristics. Next, we need to write a function ourselves to update this occupancy grid ( we don’t know where it is empty at the beginning, so we need to evaluate and update the information of each grid as the model is trained )

one example:

 def occ_eval_fn(x):
     density, _ = self.nerf_network(x)
     return density * self.render_step_size
 self.occupancy_grid.every_n_step(step=global_step, occ_eval_fn=occ_eval_fn)

Next, we look at occupancy gridhow it is introduced in training.

2.2.Ray Marching

This function implements the above-mentioned algorithm of skipping empty areas and prematurely terminating rays in occluded areas (note that this function is non-differentiable for input)

Example (refer to official documentation):

import torch
from nerfacc import OccupancyGrid, ray_marching, unpack_info

device = "cuda:0"
batch_size = 128
rays_o = torch.rand((batch_size, 3), device=device)
rays_d = torch.randn((batch_size, 3), device=device)
rays_d = rays_d / rays_d.norm(dim=-1, keepdim=True)

# Ray marching with near far plane.
ray_indices, t_starts, t_ends = ray_marching(
    rays_o, rays_d, near_plane=0.1, far_plane=1.0, render_step_size=1e-3
)

# Ray marching with aabb.
scene_aabb = torch.tensor([0.0, 0.0, 0.0, 1.0, 1.0, 1.0], device=device)
ray_indices, t_starts, t_ends = ray_marching(
    rays_o, rays_d, scene_aabb=scene_aabb, render_step_size=1e-3
)

# Ray marching with per-ray t_min and t_max.
t_min = torch.zeros((batch_size,), device=device)
t_max = torch.ones((batch_size,), device=device)
ray_indices, t_starts, t_ends = ray_marching(
    rays_o, rays_d, t_min=t_min, t_max=t_max, render_step_size=1e-3
)

# Ray marching with aabb and skip areas based on occupancy grid.
scene_aabb = torch.tensor([0.0, 0.0, 0.0, 1.0, 1.0, 1.0], device=device)
grid = OccupancyGrid(roi_aabb=[0.0, 0.0, 0.0, 0.5, 0.5, 0.5]).to(device)
ray_indices, t_starts, t_ends = ray_marching(
    rays_o, rays_d, scene_aabb=scene_aabb, grid=grid, render_step_size=1e-3
)

As you can see, the function is still very simple, roughly based on this idea:

  • 1) We need to calculate rays_o and rays_d (the starting point and direction of the ray) ourselves
  • 2) Choose the method of taking points, which may include:
    • a) Define near_plane and far_plane, and let the ray take points within this range
    • b) Define t_min, t_max, render_step_size, and sample through the minimum value, maximum value, and interval of the t value (here t_min and t_max can be different for each line).
    • c) Define a scene_aabb (bounding box shape). This function will automatically calculate the part of the ray that passes through the bounding box, and then select the point through render_step_size.

There are three return values ​​of this function, namely ray_indices,t_starts与t_ends. Their shapes are (n_samples,), (n_samples,1) and (n_samples,1) respectively. Among them, n_samples represents how many points we took (including all rays) during this ray_marching process.

For example, we shot a total of three lines. Through the optimization algorithm mentioned at the beginning, one of them took 10 points, and the other two took 5 points. Then this n_samples is 10+5+5=20, and This ray_indices indicates which of the three lines (0,1,2) each point belongs to. t_starts and t_ends represent the intervals between points. We can obtain the specific coordinates of the last point through the following code :

# Convert t_starts and t_ends to sample locations.
t_mid = (t_starts + t_ends) / 2.0
sample_locs = rays_o[ray_indices] + t_mid * rays_d[ray_indices]

2.3.Rendering

With these points in mind, we can proceed to the final rendering. Nerfacc provides us with optimized GPU-accelerated rendering functions.

Example (from official documentation ):

rays_o = torch.rand((128, 3), device="cuda:0")
rays_d = torch.randn((128, 3), device="cuda:0")
rays_d = rays_d / rays_d.norm(dim=-1, keepdim=True)
ray_indices, t_starts, t_ends = ray_marching(
    rays_o, rays_d, near_plane=0.1, far_plane=1.0, render_step_size=1e-3)
def rgb_sigma_fn(t_starts, t_ends, ray_indices):
     # This is a dummy function that returns random values.
     rgbs = torch.rand((t_starts.shape[0], 3), device="cuda:0")
     sigmas = torch.rand((t_starts.shape[0], 1), device="cuda:0")
     return rgbs, sigmas
 colors, opacities, depths = rendering(
     t_starts, t_ends, ray_indices, n_rays=128, rgb_sigma_fn=rgb_sigma_fn)
print(colors.shape, opacities.shape, depths.shape)
#torch.Size([128, 3]) torch.Size([128, 1]) torch.Size([128, 1])

It can be seen that for the rendering function, we need to provide the t_starts, t_end, ray_indices and the number of rays obtained in the previous step ray_marching, as well as an rgb_sigma_fn.

What is this rgb_sigma_fn? Very simple, it is a query function, that is, input t_starts, t_end, ray_indices to obtain the rgb and density of each point.

To be more specific, the function can be written like this (from instant-nsr-pl):

def rgb_sigma_fn(t_starts, t_ends, ray_indices):
    ray_indices = ray_indices.long()
    t_origins = rays_o[ray_indices]
    t_dirs = rays_d[ray_indices]
    positions = t_origins + t_dirs * (t_starts + t_ends) / 2.  #获得每个点的坐标
    density, feature = self.geometry(positions)  
    rgb = self.texture(feature, t_dirs)  # 输入到函数中,获得rgb与density
    return rgb, density

In addition, if we look at the source code of this rendering function, we find that it is actually built from a series of python APIs provided by nerfacc. If we need to render other different outputs (such as the variation of the depth map), we can not use rendering directly, but use the method of calling the python API.

Here is another way to write the rendering function (short version)

rgbs, sigmas = rgb_sigma_fn(t_starts, t_ends, ray_indices) #计算rgb与sigma(density)
weights = render_weight_from_density(
    t_starts,
    t_ends,
    sigmas,
    ray_indices=ray_indices,
    n_rays=n_rays,
) # 通过调用nerfacc的API,计算出每个点对应的weights
#通过累计不同的value,获得不同的output
colors = accumulate_along_rays(
        weights, ray_indices, values=rgbs, n_rays=n_rays)
opacities = accumulate_along_rays(
        weights, ray_indices, values=None, n_rays=n_rays)
depths = accumulate_along_rays(
        weights,
        ray_indices,
        values=(t_starts + t_ends) / 2.0,
        n_rays=n_rays,)

The specific API can be found in the python API , which provides very flexible functions that allow us to render different outputs according to our own network.

Finally, the obtained colors/opacities/depths can be used for loss, and then the network is updated!

Guess you like

Origin blog.csdn.net/fb_941219/article/details/131680149