[3D Reconstruction] [Deep Learning] NeRF Code Pytorch Implementation - Data Loading (Part 2)

[3D Reconstruction] [Deep Learning] NeRF Code Pytorch Implementation – Data Loading (Part 2)

The paper proposes a 5D neural radiation field as an implicit representation of complex scenes, called NeRF, which inputs sparse multi-angle pose images to train to obtain a neural radiation field model. Simply put, it performs three-dimensional implicit modeling of the scene by inputting two-dimensional images and camera poses from different perspectives of the same scene, and realizes the synthesis of scene images from any new perspective through the voxel rendering equation. This blog post will analyze the specific function module code in the data loading process based on the code execution process.



Preface

Before analyzing the NeuS network in detail, the first task is to build the operating environment required by NeRF [ reference tutorial under win10 ] and complete the training and testing of the model. Only then will it make sense to carry out follow-up work.
This blog post analyzes some of the functional code modules involved in the NeuS data loading process. Other code modules will be explained in subsequent blog posts.

The blogger has analyzed the codes of each functional module in detail in different blog posts. Click [Reference Tutorial under Win10], and the directory link of the blog post is placed in the preface.


load_llff_data

load_llff_data is in the load_llff.py file. Because there is too much content, the blogger explains the code of this function in segments. This blog post will continue to explain the subsequent code of the load_llff_data function.

# 用于将相机分布限制在固定球体内并返回一个环绕的相机轨迹位姿用于新视角合成。
if spherify:
    poses, render_poses, bds = spherify_poses(poses, bds)

spherify_poses

The function code of spherify_poses in the load_llff.py file is relatively concise, but the content is relatively rich and difficult to understand (the blogger personally feels). It is difficult to clarify the meaning and purpose of each line of code and even each variable, so the blogger puts the function code Divide it into several paragraphs and explain them separately so that you can go through them quickly.

  • min_line_dist finds the point with the shortest sum of ray distances from the center of all cameras. The blogger has searched through the existing public information on the Internet and has not found any specific support for the principle. Friends who know it can leave a message in the comment area.
# 让位姿[3×4]变为[4×4]
p34_to_44 = lambda p : np.concatenate([p, np.tile(np.reshape(np.eye(4)[-1, :], [1, 1, 4]), [p.shape[0], 1, 1])], 1)
# 位姿的旋转矩阵R的第三列(z轴相关) 方向向量
rays_d = poses[:, :3, 2:3]  # [N,3,1]
# 位姿的平移矩阵t  相机光心
rays_o = poses[:, :3, 3:4]  # [N,3,1]


# 找到离所有相机中心射线距离之和最短的点
def min_line_dist(rays_o, rays_d):
    A_i = np.eye(3) - rays_d * np.transpose(rays_d, [0,2,1])    # [N,3,3]
    b_i = -A_i @ rays_o         # [N,3,1]
    pt_mindist = np.squeeze(-np.linalg.inv((np.transpose(A_i, [0,2,1]) @ A_i).mean(0)) @ (b_i).mean(0))  # [N,3]
    return pt_mindist

# 简单理解为场景的中心位置
pt_mindist = min_line_dist(rays_o, rays_d)      # [3]
center = pt_mindist

A schematic diagram of the code is shown below:

# 所有相机光心到场景中心的方向向量的平均距离向量(xyz轴上)
up = (poses[:, :3, 3] - center).mean(0)     # [3]
# 归一化:平均单位向量
vec0 = normalize(up)    # [3]
# 找到俩俩垂直的单位方向向量
vec1 = normalize(np.cross([.1,.2,.3], vec0))    # [3]
vec2 = normalize(np.cross(vec0, vec1))             # [3]
pos = center
# 构建坐标系
c2w = np.stack([vec1, vec2, vec0, pos], 1)  # [3,4]
# 求c2w的逆矩阵,并与poses进行矩阵运算,目的是完成所有相机位姿的归一化
poses_reset = np.linalg.inv(p34_to_44(c2w[None])) @ p34_to_44(poses[:, :3, :4])      # [N,4,4]

A schematic diagram of the code is shown below:

  • Scale all camera positions to within the unit circle.
# 理解为归一化后所有光心距离的平均
rad = np.sqrt(np.mean(np.sum(np.square(poses_reset[:, :3, 3]), -1)))
# 缩放因子
sc = 1./rad
# 缩放光心
poses_reset[:, :3, 3] *= sc
# 缩放边界
bds *= sc
# 归一化
rad *= sc

A schematic diagram of the code is shown below:

  • Generate camera poses from new perspectives.
# 平均光心位置
centroid = np.mean(poses_reset[:, :3, 3], 0)        # [3]
zh = centroid[2]            # 平均光心z轴距离
radcircle = np.sqrt(rad**2-zh**2)
new_poses = []
for th in np.linspace(0., 2.*np.pi, 120):
    camorigin = np.array([radcircle * np.cos(th), radcircle * np.sin(th), zh])
    up = np.array([0, 0, -1.])
    vec2 = normalize(camorigin)
    # 构建坐标系
    vec0 = normalize(np.cross(vec2, up))
    vec1 = normalize(np.cross(vec2, vec0))
    pos = camorigin
    p = np.stack([vec0, vec1, vec2, pos], 1)       # [3,4]
    new_poses.append(p)

# 新视角:拼接在一起
new_poses = np.stack(new_poses, 0)      # [num,3,4]
# [num,3,5] 新视角位姿都拼接了原始位姿的起始位姿
new_poses = np.concatenate([new_poses, np.broadcast_to(poses[0, :3, -1:], new_poses[:, :3, -1:].shape)], -1)
# [num,3,5] 旋转平移后的新位姿都拼接了原始位姿的起始位姿
poses_reset = np.concatenate([poses_reset[:, :3, :4], np.broadcast_to(poses[0, :3, -1:], poses_reset[:, :3, -1:].shape)], -1)   

The schematic diagram of the code is shown below:

The left picture is the perspective of the xy plane. For the position camorigin of the camera optical center of the generated new perspective, the size of z is fixed zh, and x and y are on a circle with a radius of radcircle; the right The picture shows the perspective in the world coordinate system. The right picture lists four points, which correspond to the four points on the coordinate axis in the left picture, and the poses of these four points are calculated.

It is too difficult to express other point drawings. Readers can just understand the spirit.


Summarize

Introduce part of the code in the data loading process as simply and in detail as possible: spherify_poses spherical camera distribution and obtains the surrounding camera pose. The code of other functional modules will be explained later.

Guess you like

Origin blog.csdn.net/yangyu0515/article/details/132488727