NeRF code learning

Learn nerf_pytorch project code, and pytorch_lighting form code,

First, you need to read the data, input the data into the neural network for training (including generating code, generating light, calculating density color, volume rendering steps), and outputting the data

1. Data set reading

The sample given in the code is to read the Blender Lego data set, and the results obtained by reading the three data formats are the same, so here is the introduction of load_llff.py
about coordinate conversion and other issues, refer to this link

COLMAP to LLFF data format
imgs2poses.py The function of this function:
1. Call the colmap software to estimate the parameters of the camera, and generate some binary files under the sparse/0/ folder: cameras.bin, images.bin, points3D.bin, project. ini.
2. Read the binary file obtained in the previous step and save it as a poses_bounds.npy file.

The poses_bounds.npy file saves the camera pose information, an N×17 matrix, and the first 15 parameters can be rearranged into a 3x5 matrix form: the left 3x3 matrix is ​​the rotation matrix R of c2w, and the fourth column is the translation vector T of c2w. The four columns are equivalent to the camera external parameters; the fifth column is the image height H, width W and the camera focal length f (camera internal parameters) (this part is the first 15 parameters). In the two lines of code above, the camera external parameters
insert image description here
insert image description here
and The internal parameters of the camera are extracted: poses_arr[:, :-2] means to take the first 15 columns as an array of (N, 15), and reshape([-1, 3, 5]) means to take the array of (N, 15) Convert to an array of (N,3,5), N pictures, a matrix of 3 rows and 5 columns, and transpose([1,2,0]) is to reverse the order of the coordinate system of the array to get [3,5,N ] matrix, which represents the camera extrinsic poses of N pictures
insert image description here
), and then reverse the 0,1 coordinate system to get the array of (2,N) shape: bds, bds refers to the bounds depth range, that is, the near and far (sampling interval) given when inputting the network. Among the 17 parameters In the last two parameter
codes, there is image downsampling, and the operation of updating w, h, and focal length, and then reading the image, images are (N, w, h, channel) corresponding to (number of images, height, width, channel) )
insert image description here
Some other operations are also provided here: conversion of the camera coordinate system, construction of the camera matrix, image scaling, obtaining the average pose, centering the pose of the camera, generating camera trajectories for viewing angle synthesis, this part of learning the link just now, here I won't introduce it in detail.
What the data set reads is the downsampled and zoomed range, focal length, re-centered camera pose, and render_poses, rendering pose, i_test is an index number
images (number of pictures, height, width, 3 channels), poses (number of pictures, 3 channels, 5), bds (number of pictures, 2) render_poses(N_views, 3, 5), i_test is an index number: [[0: train], [train:val], [val:test]]

2. Build NeRF network

run_nerf.py

First start with the train() function,
1. Obtain the hyperparameter config_parser()
2. Read the data llff, blender, obtain the image, pose, range, rendering pose, test set
3, create_nerf generation network
4 returned in the first step , generate batch rays get_rays_np
5, render to get pixel color render
6, calculate loss, backpropagation

The specific flow chart is as follows.
insert image description here
The flow chart also refers to this link here.

Step 1: Call the get_rays() function to calculate the unit direction according to the ray_d of the ray as view_dirs
Step 2: Generate the far and near ends of the ray to determine the bounding box and aggregate them into rays (obtain the ray_o.ray_d, near of the ray ,
far, viewdirs)
Step 3: Calculate the properties of ray in parallel (by calling batchify_rays (function)
Step 4: batchify_rays () and then call the render_rays () function for subsequent rendering
Step 5: The pts attribute of render_rays () saves the pts of each sampling point Position
Step 6: Put points into the network to get RGB and σ
Step 7: render_rays() call raw2outputs() function to perform integral operation of discrete points (voxel rendering)
Step 8: set { 'rgb_map' : rgb_map, 'disp_map' : disp_map, 'acc_map : acc_ map} attribute is returned to the train

'network_query_fn' : network_query_fn,  # 上文已经解释过了,这是一个匿名函数,
给这个函数输入位置坐标,方向坐标,以及神经网络,就可以利用神经网络返回该点对应的 颜色和密度

'perturb' : args.perturb,  # 扰动,对整体算法理解没有影响

'N_importance' : args.N_importance,  # 每条光线上细采样点的数量

'network_fine' : model_fine,  # 论文中的 精细网络

'N_samples' : args.N_samples,  # 每条光线上粗采样点的数量

'network_fn' : model,  # 论文中的 粗网络

'use_viewdirs' : args.use_viewdirs,  # 是否使用视点方向,影响到神经网络是否输出颜色

'white_bkgd' : args.white_bkgd,  # 如果为 True 将输入的 png 图像的透明部分转换成白色

'raw_noise_std' : args.raw_noise_std,  # 归一化密度在这里插入代码片

Code explanation reference link

run_nerf_helpers.py

class Embedder: input high-frequency parameters to generate a position encoder embed
class NeRF: create a model, alpha output is density, rgb is color, a batch is 1024 beams, that is, a beam samples 64 points
get_rays_np(): get beam
ndc_rays (): Move the origin of the light to the near plane
sample_pdf(): Hierarchical sampling to obtain the sampling points of the fine network

Generate color_mesh in pytorch_lighting

https://github.com/kwea123/nerf_pl
An unofficial implementation of NeRF using pytorch (pytorch-lightning), providing a simpler and faster training process (also simpler code and detailed comments to help understand the work )
Video link
extract_color_mesh.py: realizes the transformation into a visualized grid or point cloud through implicit representation (color and density of points in three-dimensional space)

First divide the object into a regular cube (volume), divide the entire 3D space into small cubes one by one (can be regarded as voxels)

Input the coordinates of each small cube, and through the NeRF network, the density can be returned to indicate whether the object is occupied in this small cube. 1.
Predict the occupancy value (occupancy) to determine which positions are occupied by objects
. 2. Use the Marching Cubes algorithm to get the mesh The vertices of the triangular mesh are obtained (the mesh is composed of triangles and vertices on the triangles).
Introduction to the algorithm of Marching Cubes:
https://blog.csdn.net/weixin_38060850/article/details/109143025
3. Remove noise points , remove the scattered points connecting the triangles in the grid, and keep the largest cluster (a cluster)
4. Add color

Color Here, the color of the vertex is calculated instead of the color of the triangle, and this vertex is projected onto the training image to get its RGB values, and then the average of these values ​​is used as its final color.
insert image description here

However, the occluded part will also be colored into the previous color, as shown in the figure below.
insert image description here
The reason is that when taking a picture of the doll, the part of the cloak is occluded, but when the points here are projected back to the image, the obtained rgb is The rgb of the face (the image is occluded, and only the face is captured), so the obtained color is also the color of the face, and there is a problem. The problem to be solved becomes how to judge that the vertex is occluded and invisible in an image . If it is not visible at this angle, the point is not given a color.
The solution uses NeRF's volume density again, starting from the camera origin, forming a ray, ending at the vertex, and calculating the total density (σ image integral) along these rays. If a vertex is not occluded, the integrated opacity (volume density) will be small; otherwise, the value will be large, which means that there is something between the vertex and the camera (occluded), and the occluded Will be colored.

Guess you like

Origin blog.csdn.net/qq_44708206/article/details/130051758