0. Preface
In view of the past two years (2020, 2021), implicit rendering technology is very popular (represented by NeRF and GRAFFE), and because this implicit rendering requires a little bit of rendering foundation, and compared with normal CV The task is not very easy to understand. In order to facilitate everyone's learning and understanding, I will take the NeRF ( Neural Radiance Field ) of ECCV2020 [1]
as an example to conduct a detailed analysis of it at the code level ( [3]
implementation based on pytorch3d). I hope it will be helpful to friends in need. .
- Updated on 2022.12.23
This blog only introduces the most basic NeRF. A lot of new work has been released in recent years. If you need a more systematic and complete introductory learning course, the recommended specialized course: Neural深蓝学院
Radiation Field (NeRF) series sharing
1. What is NeRF
According to the official project [1]
, NeRF is essentially constructing an implicit rendering process whose input is the position of the light emitted from a certain angle of view ooo ,directionddd andthe corresponding coordinatesx, y, zx,y,zx,y,z . Through the neural radiation fieldF θ F_{\theta}Fi, obtain the volume density and color, and finally render the final image through volumeteric rendering .
Regarding the passing position, direction d \mathbf{d}d and coordinatex \mathbf{x}x maps to volume densityσ \sigmaσ and colorc \mathbf{c}The equation of c , the more formal expression is as follows (来自nerf论文
):
This involves a lot of concepts related to rendering, such as: What is light? How was it launched? What is the direction? For a given picture from any perspective, how should we render it according to this process?
The picture below comes from Professor Geiger of the University of Tübingen [2]
(he is also the instructor of the best paper of CVPR2021!). Taking this as an grid_sampler
example (I will talk about this grid_sampler
later), we start from each position of the image and emit a ray: o + tdo +tdo+t d .
As you can see from the above picture,the blue ball(or[1]
the black ball in the picture) is the direction of the light, according to differentttt (can be understood as depth) pointssampled, and the final content input into the network is actually theposition encoding γ \gammaThe coordinates x, y, zx,y,zof these blue points of γ (positional encoding)x,y,z and directionddd。
Next, after passing through the neural radiation field F θ F_{\theta}FiAfter processing, we got the color c \mathbf{c} of each ballc and densityσ \sigmaσ (the color is different~), then rendering can be performed according to the mechanism of volumetric rendering.
The visualization effect of rendering is as follows. I personally feel that it is very intuitive. I will not expand on the specific theory of volume rendering here. The main purpose of this blog is to introduce every aspect of the code involved in NeRF.
In general, the NeRF process is divided into 3 steps, and the following code will also be expanded according to this process:
-
(a) Use raysampler to generate rays (including the input light direction , starting point , and position ).
-
(b) For the generated sampled rays, call the volumetric function (i.e. NeuralRadianceField-
nerf/nerf/implicit_function.py
) to get rays_densities σ \sigmaσ和ray_features c\mathbf{c}c. -
© Finally, integrate the color along the ray to get the final image (as shown in the gif above)
2. NeRF code framework
Research objectives :
- Study the input & output of each link in training and inference.
- Take this opportunity to carefully analyze the design of PyTorch3D's ImplicitRenderer and Volumetric Function (which can be self-defined).
Research content :
We mainly focus on the data processing process and the key steps of NeRF . Some visualization and tool-based scripts and functions will not be introduced in this blog.
2.0 Data
Position : nerf/nerf/datasets.py/get_nerf_datasets
function
Description : Returns a data structure composed of three things: image , camera parameters , and camera number :
[{
“image”:xxx, “camera”:xxx, “camera_idx”: int}]
image: 8bit原始图像归一化到[0,1]的(torch.FloatTensor) [H, W, 3] (本例取H=W=400)
camera是pytorch3d.renderer.cameras.PerspectiveCameras的实例, 详细参数见下面的介绍.
camera_idx则是标识相机序号的int型(0, 1, 2, ..., 99).
Taking the Lego car data as an example,
the training data is pictures from 100 camera angles (Idx identifies different camera angles respectively) . Below are 3 examples selected to show: The camera model here uses a perspective camera[4]
( PerspectiveCamera
)
You can refer to the documentation to analyze the parameters of the perspective projection camera . Here we mainly use the selection matrix RR .R , translation matrixTTT , 焦距focal length focal\_ lengthf oc a l _ l e n g t h and the main viewpointprinciple _ point principle\_pointp r in c i pl e _ p o in t (see the figure above for specific values).
2.1 Structure
Location :nerf/nerf/nerf_renderer.py/pytorch模块(继承torch.nn.Module)RadianceFieldRenderer
Description : Contains pytorch3d.renderer.ImplicitRenderer
instance & instance characterizing the NeRF network.
Rendering process : coarse2fine (divided into 3 large steps, 7 small steps)
where
Coarse: 1,2,3
Fine: 4,5,6
Optimization: 7
Since the structure is far more complex than the data processing part, a new section is opened for analysis.
3. Structure
Here is the overall schematic diagram of the structure. To facilitate understanding, I have put the relevant codes and diagrams together. Please enlarge to view.
3.1 Step 1: Use raysampler to generate rays
3.2 Step 2: Call the volumetric function (i.e. NeuralRadianceField-nerf/nerf/implicit_function.py) on the generated sampled rays to get the rays_densities
sum rays_features
.
3.3 Step 3: Finally, integrate the color along the ray to get the final image
4. Question
Q1 : Why are the generated rgb_coarse
, rgb_fine
, rgb_gt
all [bs, 1024(n_rays_per_image), 3] instead of [bs, H, W, 3]?
A1 : Because the calculation amount of volume rendering is very large, only a part of the light rendering results are calculated at a time (that is, corresponding to the local patch in the image), and then the patches are combined to finally obtain the rendered image.
Taking H=W=400 as an example, [bs, 1024, 3] has been used during training to calculate loss and iterate the network.
When testing, first calculate according to the chunk_size allowed by the GPU to obtain the total amount of light beams that need to be calculated:
step1 : nerf/nerf/nerf_renderer.py
around line 340
if not self.training:
# Full evaluation pass.
# self._renderer['coarse'].raysampler.get_n_chunks是根据xy_grid来计算需要多少个
# chunk (比如400*400, chunk_size_test=6000, 那n_chunks就等于
# (Pdb) math.ceil(160000 / 6000)
# 27
n_chunks = self._renderer["coarse"].raysampler.get_n_chunks(
self._chunk_size_test,
camera.R.shape[0],
)
# print("[n chunks] shape", n_chunks)
# 测试阶段, n_chunks等于27
else:
# MonteCarlo ray sampling.
n_chunks = 1
step2 : Render according to the number of rays per rendering (6000) and the total number of beams (27), and finally splice them: 26 x 6000+ 1 x 4000 = 160000, and then reshape it to an image of 400*400 ~
# Process the chunks of rays.
# chunk_outputs[0]是训练的输出, 因为n_chunks=1.
# 测试阶段, 以lego为例, n_chunks=27, chunk_outputs是个list,
# 其中的每个item都是dict, keys为[rgb_coarse, rgb_fine, rgb_gt].
# rgb_coarse/rgb_fine/rgb_gt都是[bs(1), 6000, 3].
chunk_outputs = [
self._process_ray_chunk(
camera_hash,
camera,
image,
chunk_idx,
)
for chunk_idx in range(n_chunks)
]
import pdb; pdb.set_trace()
# (Pdb) len(chunk_outputs)
# 27
# (Pdb) for item in chunk_outputs: print(item['rgb_fine'].shape)
# 可以看到, 26*6000+1*4000 = 160000 = 400*400, 这就是H*W!
# torch.Size([1, 6000, 3])
# torch.Size([1, 6000, 3])
# torch.Size([1, 6000, 3])
# ...
# torch.Size([1, 6000, 3])
# torch.Size([1, 4000, 3])
if not self.training:
# For a full render pass concatenate the output chunks,
# and reshape to image size.
# 拼接即可.
out = {
k: torch.cat(
[ch_o[k] for ch_o in chunk_outputs],
dim=1,
).view(-1, *self._image_size, 3)
if chunk_outputs[0][k] is not None
else None
for k in ("rgb_fine", "rgb_coarse", "rgb_gt")
}
else:
out = chunk_outputs[0]
[1] NeRF
[2]: Introduction to GRAF
[3]: pytorch3d/nerf
[4]: pytorch3d/PerspectiveCamera