Python camera pose transformation


project scene

Normalize the image data of the custom data into a 2×2 cube for training. Since the camera position of the image is approximately on a plane, and the scene is mainly below the camera position. So that is to say, these camera poses should be transformed to the surface above the cube. In this way, the training object can fall into the training scene and be approximately centered in the scene, which is beneficial to training. In fact, it is a process of calculating seven parameters (three translations, three rotation angles, and one scale factor) to transform the space Cartesian coordinate system. Since the camera pose derived by matching such as metashape or colmap is usually based on the camera coordinate system of the first photo, it needs to be transformed in further processing (also pay attention to the camera used in the model Whether the coordinate system is consistent with the z-axis direction in the original camera coordinate system ).

insert image description here

camera pose

The camera pose (position and attitude) is determined by the camera's extrinsic matrix, and the projection properties are determined by the camera's intrinsic matrix.
The camera extrinsic parameters can be represented by a 4×4 matrix, which is used to transform the points of the world coordinate system into the camera coordinate system. Therefore, the camera external parameter matrix is ​​often called the world-to-camera (w2c) matrix. The inverse matrix of the camera's external parameters is called the camera-to-world (c2w) matrix. Its function is to transform the points of the camera coordinate system into the world coordinate system, which is often referred to as the pose matrix (poses).

The c2w matrix is ​​a 4x4 matrix, c2w[:3,:3] is the rotation matrix R ( also represents the camera attitude, and its three column vectors represent the direction of the xyz axis of the camera coordinate system ), c2w[:3,3] Is the translation vector T, c2w[3,3] is the scale factor S.

insert image description here
Refer to this article for the picture and description , which has a good interpretation of the camera parameters and NeRF code, and it is recommended to read.


rotation transformation

Since the position of the image I need to process is basically on a plane, the next step is to use the normal vector of the plane as an example to calculate the rotation matrix to perform the rotation transformation.

Find the equation of the plane

def fit_a_plane(x2, y2, z2):
	"""拟合平面"""
    # 创建系数矩阵A
    A = np.zeros((3, 3))
    for i in range(len(x2)):
        A[0, 0] = A[0, 0] + x2[i] ** 2
        A[0, 1] = A[0, 1] + x2[i] * y2[i]
        A[0, 2] = A[0, 2] + x2[i]
        A[1, 0] = A[0, 1]
        A[1, 1] = A[1, 1] + y2[i] ** 2
        A[1, 2] = A[1, 2] + y2[i]
        A[2, 0] = A[0, 2]
        A[2, 1] = A[1, 2]
        A[2, 2] = len(x2)

    # 创建b矩阵
    b = np.zeros((3, 1))
    for i in range(len(x2)):
        b[0, 0] = b[0, 0] + x2[i] * z2[i]
        b[1, 0] = b[1, 0] + y2[i] * z2[i]
        b[2, 0] = b[2, 0] + z2[i]


    # 求解X矩阵
    A_inv = np.linalg.inv(A)
    X = np.dot(A_inv, b)
    # print('平面拟合结果为:z = %.3f * x + %.3f * y + %.3f' % (X[0, 0], X[1, 0], X[2, 0]))

    # 计算方差
    R = 0
    for i in range(len(x2)):
        R = R + (X[0, 0] * x2[i] + X[1, 0] * y2[i] + X[2, 0] - z2[i]) ** 2
    # print('方差为:%.*f' % (3, R))

    return [X[0, 0], X[1, 0], X[2, 0]]

find normal vector

def get_normal_vector(point1, point2, point3):
    '''三个点计算平面法向量'''
    vect1 = np.array(point2) - np.array(point1)
    vect2 = np.array(point3) - np.array(point1)
    norm_vect = np.cross(vect1, vect2)
    return norm_vect

Find the rotation matrix

def get_R_matrix( vector_src, vector_tgt):
    """计算两平面(法向量)间的旋转矩阵"""    
    vector_src  =  vector_src / np.linalg.norm( vector_src)
    vector_tgt =  vector_tgt / np.linalg.norm( vector_tgt)
    
    c = np.dot( vector_src,  vector_tgt)
    n_vector = np.cross( vector_src,  vector_tgt)
    
    n_vector_invert = np.array((
        [0, -n_vector[2], n_vector[1]],
        [n_vector[2], 0, -n_vector[0]],
        [-n_vector[1], n_vector[0], 0]))
    
    I = np.eye(3)
    R_w2c = I + n_vector_invert + np.dot(n_vector_invert, n_vector_invert) / (1 + c)
    return R_w2c

translation transformation

This is very simple. The translation matrix is ​​obtained by taking the difference between the target center coordinates and the mean value of the camera position coordinates.

T_move = center_tgt - p_mean

scale transformation

This is also simple, the ratio of the target scale to the size of the bounding box of all current camera positions

scene_scale = scale_tgt / (p_max - p_min)

Change order

  1. rotate first
  2. Rescale
  3. re-translate

It should be noted that the attitude must be taken into account, and the attitude is synchronized when rotating.

The following code takes z=1 as the target plane, and center_tgt=[0,0,1] as the coordinates of the target center point. poses is the c2w pose matrix of all cameras.

# File      :transform_poses.py
# Auther    :WooChi
# Time      :2023/03/15
# Version   :1.0
# Function  :坐标变换到固定场景

def poses_transform(poses,center_tgt=np.array([0, 0, 1])):
	
    # 1. 旋转变换
    # 1.1 拟合平面方程
    f_p = fit_a_plane(poses[:, 0, 3], poses[:, 1, 3], poses[:, 2, 3])

    # 在平面上拿出三个点
    points = np.array([[-1., -1., 0.],
                       [-1., 1., 0.],
                       [1., 1., 0.]])

    points[0, 2] = points[0, 0] * f_p[0] + points[0, 1] * f_p[1] + f_p[2]
    points[1, 2] = points[1, 0] * f_p[0] + points[1, 1] * f_p[1] + f_p[2]
    points[2, 2] = points[2, 0] * f_p[0] + points[2, 1] * f_p[1] + f_p[2]

    # 1.2 计算法向量
    normal_p = get_normal_vector(points[0, :], points[1, :], points[2, :])

    # 目标平面的法向量
    normal_cube = np.array([0., 0., -1.])
    
	# 1.3 求旋转矩阵
    R_w2c = get_R_matrix(normal_p, normal_cube)

    # 1.4 对位置进行旋转变换
    # 先从位姿矩阵中取出位置坐标
    p_src = np.zeros(shape=(len(poses[:, 0, 3]), 3))
    p_src[:, 0] = poses[:, 0, 3]
    p_src[:, 1] = poses[:, 1, 3]
    p_src[:, 2] = poses[:, 2, 3]
	# 再对位置坐标进行旋转变换
    p_new = np.dot(R_w2c, np.transpose(p_src))
    
    # 把变换后的相机位置坐标放到位姿矩阵中
    poses_new = poses.copy()
    poses_new[:, 0, 3] = p_new[0, :]
    poses_new[:, 1, 3] = p_new[1, :]
    poses_new[:, 2, 3] = p_new[2, :]
	
	# 1.5 对姿态进行旋转变换,其实就是对三个列向量(坐标轴)进行旋转变换
    poses_new[:, :3, 0] = np.dot(R_w2c, np.transpose(poses_new[:, :3, 0])).transpose()
    poses_new[:, :3, 1] = np.dot(R_w2c, np.transpose(poses_new[:, :3, 1])).transpose()
    poses_new[:, :3, 2] = np.dot(R_w2c, np.transpose(poses_new[:, :3, 2])).transpose()
	

	# 2. 缩放变换 
    # 2.1 求相机位置坐标所占空间大小
    max_vertices = np.max(p_new, axis=1)
    min_vertices = np.min(p_new, axis=1)
    
    # 2.2 求缩放因子,目标尺寸为2
    scene_scale = 2 / (np.max(max_vertices - min_vertices))

    
    # 2.2. 对位置进行缩放变换
    poses_new[:, :3, 3] *= scene_scale
    p_new[0, :] = poses_new[:, 0, 3]
    p_new[1, :] = poses_new[:, 1, 3]
    p_new[2, :] = poses_new[:, 2, 3]
  

    # 3. 对位置进行平移变换
    T_move = np.array([center_tgt - np.mean(p_new, axis=1)])
    p_new = p_new + T_move.transpose()
	
    poses_new[:, 0, 3] = p_new[0, :]
    poses_new[:, 1, 3] = p_new[1, :]
    poses_new[:, 2, 3] = p_new[2, :]

    return poses_new   

plane fitting
insert image description here

pose transformation
insert image description here
insert image description here


nice.

Guess you like

Origin blog.csdn.net/m0_50910915/article/details/129695132