[NeRF] Detailed analysis of code + logic

0. Preface

In view of the past two years (2020, 2021), implicit rendering technology is very popular (represented by NeRF and GRAFFE), and because this implicit rendering requires a little bit of rendering foundation, and compared with normal CV The task is not very easy to understand. In order to facilitate everyone's learning and understanding, I will take the NeRF ( Neural Radiance Field ) of ECCV2020 [1]as an example to conduct a detailed analysis of it at the code level ( [3]implementation based on pytorch3d). I hope it will be helpful to friends in need. .

  • Updated on 2022.12.23
    This blog only introduces the most basic NeRF. A lot of new work has been released in recent years. If you need a more systematic and complete introductory learning course, the recommended specialized course: Neural 深蓝学院Radiation Field (NeRF) series sharing

1. What is NeRF

According to the official project [1], NeRF is essentially constructing an implicit rendering process whose input is the position of the light emitted from a certain angle of view ooo ,directionddd andthe corresponding coordinatesx, y, zx,y,zx,y,z . Through the neural radiation fieldF θ F_{\theta}Fi, obtain the volume density and color, and finally render the final image through volumeteric rendering .

Regarding the passing position, direction d \mathbf{d}d and coordinatex \mathbf{x}x maps to volume densityσ \sigmaσ and colorc \mathbf{c}The equation of c , the more formal expression is as follows (来自nerf论文):
insert image description here

insert image description here
This involves a lot of concepts related to rendering, such as: What is light? How was it launched? What is the direction? For a given picture from any perspective, how should we render it according to this process?

The picture below comes from Professor Geiger of the University of Tübingen [2](he is also the instructor of the best paper of CVPR2021!). Taking this as an grid_samplerexample (I will talk about this grid_samplerlater), we start from each position of the image and emit a ray: o + tdo +tdo+t d .
insert image description here
As you can see from the above picture,the blue ball(or[1]the black ball in the picture) is the direction of the light, according to differentttt (can be understood as depth) pointssampled, and the final content input into the network is actually theposition encoding γ \gammaThe coordinates x, y, zx,y,zof these blue points of γ (positional encoding)x,y,z and directionddd

Next, after passing through the neural radiation field F θ F_{\theta}FiAfter processing, we got the color c \mathbf{c} of each ballc and densityσ \sigmaσ (the color is different~), then rendering can be performed according to the mechanism of volumetric rendering.
insert image description here

The visualization effect of rendering is as follows. I personally feel that it is very intuitive. I will not expand on the specific theory of volume rendering here. The main purpose of this blog is to introduce every aspect of the code involved in NeRF.
Please add image description

In general, the NeRF process is divided into 3 steps, and the following code will also be expanded according to this process:

  • (a) Use raysampler to generate rays (including the input light direction , starting point , and position ).

  • (b) For the generated sampled rays, call the volumetric function (i.e. NeuralRadianceField- nerf/nerf/implicit_function.py) to get rays_densities σ \sigmaσray_features c\mathbf{c}c.

  • © Finally, integrate the color along the ray to get the final image (as shown in the gif above)

2. NeRF code framework

Research objectives :

  • Study the input & output of each link in training and inference.
  • Take this opportunity to carefully analyze the design of PyTorch3D's ImplicitRenderer and Volumetric Function (which can be self-defined).

Research content :
We mainly focus on the data processing process and the key steps of NeRF . Some visualization and tool-based scripts and functions will not be introduced in this blog.

insert image description here

2.0 Data

Position : nerf/nerf/datasets.py/get_nerf_datasetsfunction

Description : Returns a data structure composed of three things: image , camera parameters , and camera number :

[{
    
    “image”:xxx, “camera”:xxx, “camera_idx”: int}]

image: 8bit原始图像归一化到[0,1](torch.FloatTensor) [H, W, 3] (本例取H=W=400)
camera是pytorch3d.renderer.cameras.PerspectiveCameras的实例, 详细参数见下面的介绍. 
camera_idx则是标识相机序号的int型(0, 1, 2, ..., 99).

Taking the Lego car data as an example,
insert image description here
the training data is pictures from 100 camera angles (Idx identifies different camera angles respectively) . Below are 3 examples selected to show: The camera model here uses a perspective camera[4] ( PerspectiveCamera)
insert image description here
You can refer to the documentation to analyze the parameters of the perspective projection camera . Here we mainly use the selection matrix RR .R , translation matrixTTT , 焦距focal length focal\_ lengthf oc a l _ l e n g t h and the main viewpointprinciple _ point principle\_pointp r in c i pl e _ p o in t (see the figure above for specific values).
insert image description here

2.1 Structure

Location :nerf/nerf/nerf_renderer.py/pytorch模块(继承torch.nn.Module)RadianceFieldRenderer

Description : Contains pytorch3d.renderer.ImplicitRendererinstance & instance characterizing the NeRF network.

Rendering process : coarse2fine (divided into 3 large steps, 7 small steps)
insert image description here
where
Coarse: 1,2,3
Fine: 4,5,6
Optimization: 7

Since the structure is far more complex than the data processing part, a new section is opened for analysis.

3. Structure

Here is the overall schematic diagram of the structure. To facilitate understanding, I have put the relevant codes and diagrams together. Please enlarge to view.
Please add image description

3.1 Step 1: Use raysampler to generate rays

Please add image description

3.2 Step 2: Call the volumetric function (i.e. NeuralRadianceField-nerf/nerf/implicit_function.py) on the generated sampled rays to get the rays_densitiessum rays_features.

insert image description here

3.3 Step 3: Finally, integrate the color along the ray to get the final image

insert image description here

4. Question

Q1 : Why are the generated rgb_coarse, rgb_fine, rgb_gtall [bs, 1024(n_rays_per_image), 3] instead of [bs, H, W, 3]?

A1 : Because the calculation amount of volume rendering is very large, only a part of the light rendering results are calculated at a time (that is, corresponding to the local patch in the image), and then the patches are combined to finally obtain the rendered image.

Taking H=W=400 as an example, [bs, 1024, 3] has been used during training to calculate loss and iterate the network.

When testing, first calculate according to the chunk_size allowed by the GPU to obtain the total amount of light beams that need to be calculated:

step1 : nerf/nerf/nerf_renderer.pyaround line 340

if not self.training:
    # Full evaluation pass.
    # self._renderer['coarse'].raysampler.get_n_chunks是根据xy_grid来计算需要多少个
    # chunk (比如400*400, chunk_size_test=6000, 那n_chunks就等于
    # (Pdb) math.ceil(160000 / 6000)
    # 27
    n_chunks = self._renderer["coarse"].raysampler.get_n_chunks(
        self._chunk_size_test,
        camera.R.shape[0],
    )
    # print("[n chunks] shape", n_chunks)
    # 测试阶段, n_chunks等于27
else:
    # MonteCarlo ray sampling.
    n_chunks = 1

step2 : Render according to the number of rays per rendering (6000) and the total number of beams (27), and finally splice them: 26 x 6000+ 1 x 4000 = 160000, and then reshape it to an image of 400*400 ~

        # Process the chunks of rays.
        # chunk_outputs[0]是训练的输出, 因为n_chunks=1.
        # 测试阶段, 以lego为例, n_chunks=27, chunk_outputs是个list, 
        # 其中的每个item都是dict, keys为[rgb_coarse, rgb_fine, rgb_gt].
        # rgb_coarse/rgb_fine/rgb_gt都是[bs(1), 6000, 3].
        chunk_outputs = [
            self._process_ray_chunk(
                camera_hash,
                camera,
                image,
                chunk_idx,
            )
            for chunk_idx in range(n_chunks)
        ]
        import pdb; pdb.set_trace()
#         (Pdb) len(chunk_outputs)
#         27
#         (Pdb) for item in chunk_outputs: print(item['rgb_fine'].shape)
#         可以看到, 26*6000+1*4000 = 160000 = 400*400, 这就是H*W!
#         torch.Size([1, 6000, 3])
#         torch.Size([1, 6000, 3])
#         torch.Size([1, 6000, 3])
#         ...
#         torch.Size([1, 6000, 3])
#         torch.Size([1, 4000, 3])

		if not self.training:
            # For a full render pass concatenate the output chunks,
            # and reshape to image size.
            # 拼接即可.
            out = {
    
    
                k: torch.cat(
                    [ch_o[k] for ch_o in chunk_outputs],
                    dim=1,
                ).view(-1, *self._image_size, 3)
                if chunk_outputs[0][k] is not None
                else None
                for k in ("rgb_fine", "rgb_coarse", "rgb_gt")
            }
        else:
            out = chunk_outputs[0]

[1] NeRF
[2]: Introduction to GRAF
[3]: pytorch3d/nerf
[4]: ​​pytorch3d/PerspectiveCamera

Guess you like

Origin blog.csdn.net/g11d111/article/details/118959540