AutoCV Lesson Ten: 3D Basics

3D basics

foreword

Handwriting AI launched a new nanny-level self-driving CV course from zero handwriting, link . Record personal study notes for your own reference only.

In this course, let's learn about the visualization of the nuScenes dataset.

The course outline can be seen in the mind map below.

insert image description here

1. nuScenes dataset

Clarify the purpose of our study: to project the Lidar point cloud of the sample in the nuScenes dataset onto the Camera image

Before the actual implementation, there are some basic concepts that we need to understand. Since each sensor is installed in a different position of the vehicle, in order to describe the relationship between them, it is necessary to establish a different coordinate system. At the same time, it is necessary to calibrate the sensors and obtain the information of their positions. The relative relationship facilitates subsequent multi-sensor fusion.

Let's first briefly understand several common coordinate systems

  • Global coordinate system (global coordinate)

    • It can be simply considered that the position of the vehicle at time t0 is the origin of the global coordinate system
  • Car body coordinate system (ego_pose/ego coordinate)

    • Coordinate system with the car body as the origin
  • Sensor coordinate system

    • lidar coordinate system
    • camera coordinate system
    • radar coordinate system

Next, let's briefly understand the calibration of several sensors.

  • The results obtained by lidar calibration are:
    • The position (translation) and rotation (rotation) of lidar relative to ego
    • translation is represented by 3 float numbers
      • position relative to ego
    • rotation uses 4 float numbers to represent the quaternion
  • The results obtained by camera calibration are:
    • The position (translation) and rotation (rotation) of the camera relative to the ego
    • Camera intrinsic reference camera intrinsic (3d->2d plane)
    • Camera distortion parameters (currently the nuScenes dataset is not considered)
    • translation is represented by 3 float numbers
      • position relative to ego
    • rotation uses 4 float numbers to represent the quaternion

Let's take a look at how to convert Lidar's point cloud data to Camera? You might say that it is enough to convert the Lidar coordinate system to the ego coordinate system first and then convert the ego coordinate system to the camera coordinate system? That conversion is actually problematic.

So what's the problem? It is not difficult to find that the frequency of the two sensors, Lidar and Camera, is different, that is, the time to capture data is different. Assume that the current timestamp captured by the lidar is t0, and the corresponding car body coordinate system is ego_pose0, while the timestamp captured by the camera is t1, and the corresponding car body coordinate system is ego_pose1, so considering the time issue, the final conversion should be as follows:

lidar_points -> ego_pose0 -> global -> ego_pose1 -> camera -> intrinsic -> image

2. Code implementation

We need to be familiar with the data format in the nuScenes dataset, and we need to often refer to Figure 2-1 in the process of implementing specific codes

insert image description here

Figure 2-1 nuScenes Data Format

2.1 Preliminary work

Before the official start, you need to install the nuScenes package. The installation instructions are as follows:

pip install nuscenes-devkit

During the installation process, the following problems were encountered:

insert image description here

Figure 2-2 nuscenes-devkit installation problem

The solution can refer to this blog post , it is relatively simple, just follow the steps

2.2 Initialize nuScenes class

Use the nuscenes library to initialize the nuScenes class and perform sample processing

The sample code is as follows:

# pip install nuscenes-devkit
from nuscenes.nuscenes import NuScenes
from pyquaternion import Quaternion
import numpy as np
import cv2
import os

# 构建nuScenes类
version = "v1.0-mini"
dataroot = "D:\\Data\\nuScenes"
nuscenes = NuScenes(version, dataroot, verbose=False)

sample = nuscenes.sample[0]

The above sample code imports some library packages, initializes the NuScenes class, and prints the first sample

The output is as follows, you can see that we print the content in the first sample of nuScenes, and its result is a dictionary, and then we need to access the corresponding value according to the token corresponding to the key in the dictionary

{
    
    'token': 'ca9a282c9e77460f8360f564131a8af5', 'timestamp': 1532402927647951, 'prev': '', 'next': '39586f9d59004284a7114a68825e8eec', 'scene_token': 'cc8c0bf57f984915a77078b10eb33198', 'data': {
    
    'RADAR_FRONT': '37091c75b9704e0daa829ba56dfa0906', 'RADAR_FRONT_LEFT': '11946c1461d14016a322916157da3c7d', 'RADAR_FRONT_RIGHT': '491209956ee3435a9ec173dad3aaf58b', 'RADAR_BACK_LEFT': '312aa38d0e3e4f01b3124c523e6f9776', 'RADAR_BACK_RIGHT': '07b30d5eb6104e79be58eadf94382bc1', 'LIDAR_TOP': '9d9bf11fb0e144c8b446d54a8a00184f', 'CAM_FRONT': 'e3d495d4ac534d54b321f50006683844', 'CAM_FRONT_RIGHT': 'aac7867ebf4f446395d29fbd60b63b3b', 'CAM_BACK_RIGHT': '79dbb4460a6b40f49f9c150cb118247e', 'CAM_BACK': '03bea5763f0f4722933508d5999c5fd8', 'CAM_BACK_LEFT': '43893a033f9c46d4a51b5e08a67a1eb7', 'CAM_FRONT_LEFT': 'fe5422747a7d4268a4b07fc396707b23'}, 'anns': ['ef63a697930c4b20a6b9791f423351da', '6b89da9bf1f84fd6a5fbe1c3b236f809', '924ee6ac1fed440a9d9e3720aac635a0', ...]} 

2.3 Get lidar data

From the printed results of the above samples, we can see that the data is stored in 'data', and the token of the key corresponding to the lidar data is LIDAR_TOP, so the lidar data can be obtained through the following sample code:

# 获取lidar的数据
lidar_token = sample["data"]["LIDAR_TOP"]
lidar_sample_data = nuscenes.get('sample_data', lidar_token)
print(lidar_sample_data)
lidar_file = os.path.join(dataroot, lidar_sample_data['filename'])

# 加载点云数据
lidar_points = np.fromfile(lidar_file, dtype=np.float32).reshape(-1, 5)

Get the token through the key in the dictionary, and get the lidar data through the token. It is worth noting that after we read the point cloud data through the path of the lidar file, we will reshape it into the form of (-1,5), where 5 codes are one x, y, z, intensity, ring_index represented by point cloud data

The output is as follows:

# lidar_sample_data
{
    
    'token': '9d9bf11fb0e144c8b446d54a8a00184f', 'sample_token': 'ca9a282c9e77460f8360f564131a8af5', 'ego_pose_token': '9d9bf11fb0e144c8b446d54a8a00184f', 'calibrated_sensor_token': 'a183049901c24361a6b0b11b8013137c', 'timestamp': 1532402927647951, 'fileformat': 'pcd', 'is_key_frame': True, 'height': 0, 'width': 0, 'filename': 'samples/LIDAR_TOP/n015-2018-07-24-11-22-45+0800__LIDAR_TOP__1532402927647951.pcd.bin', 'prev': '', 'next': '0cedf1d2d652468d92d23491136b5d15', 'sensor_modality': 'lidar', 'channel': 'LIDAR_TOP'}

# lidar_points
[[-3.1243734e+00 -4.3415368e-01 -1.8671920e+00  4.0000000e+00
   0.0000000e+00]
 [-3.2906363e+00 -4.3220678e-01 -1.8631892e+00  1.0000000e+00
   1.0000000e+00]
 [-3.4704101e+00 -4.3068862e-01 -1.8595628e+00  2.0000000e+00
   2.0000000e+00]
 ...
 [-1.4129141e+01  4.9357712e-03  1.9857219e+00  8.0000000e+01
   2.9000000e+01]
 [-1.4120683e+01  9.8654358e-03  2.3199446e+00  7.5000000e+01
   3.0000000e+01]
 [-1.4113669e+01  1.4782516e-02  2.6591547e+00  4.0000000e+01
   3.1000000e+01]]

2.4 lidar coordinate conversion

After obtaining the lidar data, we need to convert the point cloud data from the lidar coordinate system to the global coordinate system through some coordinate transformations. The conversion process is lidar -> ego -> global. First convert the lidar coordinate system to the ego coordinate system, and then convert the ego coordinate system The coordinate system is converted to the global coordinate system, and the specific implementation code is as follows:

# lidar坐标变换
# 获取lidar相对于车体坐标系而言的translation和rotation标定数据
lidar_calibrated_data = nuscenes.get("calibrated_sensor", lidar_sample_data["calibrated_sensor_token"])
print(lidar_calibrated_data)

def get_matrix(calibrated_data, inverse=False):
    # 返回的结果是 lidar->ego 坐标系的变换矩阵
    output = np.eye(4)
    output[:3, :3] = Quaternion(calibrated_data["rotation"]).rotation_matrix
    output[:3,  3] = calibrated_data["translation"]
    if inverse:
        output = np.linalg.inv(output)
    return output

# lidar->ego 变换矩阵
lidar_to_ego = get_matrix(lidar_calibrated_data)
print(lidar_to_ego)

# 获取lidar数据时对应的车体位姿
ego_pose = nuscenes.get("ego_pose", lidar_sample_data["ego_pose_token"])
print(ego_pose)
# ego->global 变换矩阵
ego_to_global = get_matrix(ego_pose)
print(ego_to_global)
# lidar->global 变换矩阵
lidar_to_global = ego_to_global @ lidar_to_ego
print(lidar_to_global)

# x, y, z -> x, y, z, 1 转换为齐次坐标,方便做矩阵相乘
hom_points = np.concatenate([lidar_points[:, :3], np.ones((len(lidar_points), 1))], axis=1)
# lidar_points -> global
global_points = hom_points @ lidar_to_global.T
print(global_points)

In the above code, we 'calibrated_sensor_token'obtained the translation and rotation during calibration through lidar_sample_data, and get_matrixobtained the transformation matrix through the function, where rotation represents rotation transformation and translation represents position transformation

Then you may ask 'rotation'that contains only 4 values, why can it be expressed as a 3x3 rotation matrix? This is because the 4 values ​​of rotation, which represent quaternions, can be converted into rotation matrices through certain transformations, assuming quaternions q \mathbf{q}The vector form of q is q = ( w , x , y , z ) \mathbf{q} = (w,x,y,z)q=(w,x,y,z ) , it corresponds to the quaternionq \mathbf{q}q 的旋转矩阵计算如下:
R q = [ 1 − 2 y 2 − 2 z 2 2 x y − 2 w z 2 x z + 2 w y 2 x y + 2 w z 1 − 2 x 2 − 2 z 2 2 y z − 2 w x 2 x z − 2 w y 2 y z + 2 w x 1 − 2 x 2 − 2 y 2 ] \mathbf{R_q}=\left[\begin{array}{ccc}1-2y^2-2z^2&2xy-2wz&2xz+2wy\\ 2xy+2wz&1-2x^2-2z^2&2yz-2wx\\ 2xz-2wy&2yz+2wx&1-2x^2-2y^2\end{array}\right] Rq= 12 y22z _22xy+2 w z2 x z2 w y2xy2 w z12x _22z _22 years+2wx2 x z+2 w y2 years2wx12x _22 y2
More detailed description: Quaternion & rotation

In the above code, we first calculate the transformation matrix of lidar->ego, then calculate the transformation matrix of ego->global, multiply the two to get the transformation matrix of lidar->global, and apply the transformation matrix to each The final global_points can be obtained on a point cloud. It is worth noting that, in order to facilitate matrix multiplication, we convert lidar_points data into homogeneous coordinates

The output is as follows:

# lidar_calibrated_data
{
    
    'token': 'a183049901c24361a6b0b11b8013137c', 'sensor_token': 'dc8b396651c05aedbb9cdaae573bb567', 'translation': [0.943713, 0.0, 1.84023], 'rotation': [0.7077955119163518, -0.006492242056004365, 0.010646214713995808, -0.7063073142877817], 'camera_intrinsic': []}

# lidar_to_ego
[[ 0.00203327  0.99970406  0.02424172  0.943713  ]
 [-0.99998053  0.00217566 -0.00584864  0.        ]
 [-0.00589965 -0.02422936  0.99968902  1.84023   ]
 [ 0.          0.          0.          1.        ]]

# ego_pose0
{
    
    'token': '9d9bf11fb0e144c8b446d54a8a00184f', 'timestamp': 1532402927647951, 'rotation': [0.5720320396729045, -0.0016977771610471074, 0.011798001930183783, -0.8201446642457809], 'translation': [411.3039349319818, 1180.8903791765097, 0.0]}

# ego_to_global0
[[-3.45552926e-01  9.38257989e-01  1.62825160e-02  4.11303935e+02]
 [-9.38338111e-01 -3.45280305e-01 -1.74097708e-02  1.18089038e+03]
 [-1.07128245e-02 -2.12945025e-02  9.99715849e-01  0.00000000e+00]
 [ 0.00000000e+00  0.00000000e+00  0.00000000e+00  1.00000000e+00]]

# lidar_to_global
[[-9.39038386e-01 -3.43803850e-01  2.41312207e-03  4.11007796e+02]
 [ 3.43468398e-01 -9.38389802e-01 -3.81318685e-02  1.17997282e+03]
 [ 1.53743323e-02 -3.49784571e-02  9.99269802e-01  1.82959727e+00]
 [ 0.00000000e+00  0.00000000e+00  0.00000000e+00  1.00000000e+00]]

# global_points
[[ 4.14086460e+02  1.17937830e+03 -6.90804743e-02  1.00000000e+00]
 [ 4.14241928e+02  1.17931922e+03 -6.77048676e-02  1.00000000e+00]
 [ 4.14410229e+02  1.17925591e+03 -6.68980550e-02  1.00000000e+00]
 ...
 [ 4.24278696e+02  1.17503955e+03  3.59647049e+00  1.00000000e+00]
 [ 4.24269865e+02  1.17502509e+03  3.93040672e+00  1.00000000e+00]
 [ 4.24262408e+02  1.17500995e+03  4.26930490e+00  1.00000000e+00]]

2.5 camera coordinate transformation

In the previous analysis, we have converted the point cloud data of the lidar coordinate system to the global coordinate system. To project it to the camera, we need to calculate the coordinate transformation from global to camera. Let’s implement it below. The specific code is as follows :

# camera坐标变换
cameras = ['CAM_FRONT_LEFT', 'CAM_FRONT', 'CAM_FRONT_RIGHT', 'CAM_BACK_LEFT', 'CAM_BACK', 'CAM_BACK_RIGHT']

for cam in cameras:
    # 获取camera的数据
    camera_token = sample["data"][cam]
    camera_data  = nuscenes.get("sample_data", camera_token)
    print(camera_data)
    # 加载图像
    image_file = os.path.join(dataroot, camera_data["filename"])
    image = cv2.imread(image_file)
    
    # 获取camera数据时对应的车体位姿
    camera_ego_pose = nuscenes.get("ego_pose", camera_data["ego_pose_token"])
    # global->ego 变换矩阵
    global_to_ego = get_matrix(camera_ego_pose, True)

    # 获取camera相对于车体坐标系而言的translation、rotation和camera_intrinsic标定数据
    camera_calibrated_data = nuscenes.get("calibrated_sensor", camera_data["calibrated_sensor_token"])
    print(camera_calibrated_data)
    # ego->camera 变换矩阵
    ego_to_camera = get_matrix(camera_calibrated_data, True)
    camera_intrinsic = np.eye(4)
    camera_intrinsic[:3, :3] = camera_calibrated_data["camera_intrinsic"]

    # global->camera_ego_pose->camera->image
    global_to_image = camera_intrinsic @ ego_to_camera @ global_to_ego

    image_points = global_points @ global_to_image.T
    image_points[:, :2] /= image_points[:, [2]] # 归一化
    
    # 过滤z<=0的点,因为它在图像平面的后面,形不成投影
    # z > 0
    for x, y in image_points[image_points[:, 2] > 0, :2].astype(int):
        cv2.circle(image, (x, y), 3, (255, 0, 0), -1, 16)

    cv2.imwrite(f"{
      
      cam}.jpg", image)

In the above sample code, we finally realized the global->camera transformation through a series of coordinate transformations, and the 3D->2D transformation of the camera is obtained through the transformation of the camera internal parameter matrix.

image_points[:, :2] /= image_points[:, [2]]This line of code is to divide the x, y coordinate value by the z depth value to achieve normalization processing. The purpose is to normalize the point cloud projection coordinates to the unit coordinate system on the image plane for subsequent drawing and processing.

The output is as follows:

# camera_data
{
    
    'token': 'fe5422747a7d4268a4b07fc396707b23', 'sample_token': 'ca9a282c9e77460f8360f564131a8af5', 'ego_pose_token': 'fe5422747a7d4268a4b07fc396707b23', 'calibrated_sensor_token': '75ad8e2a8a3f4594a13db2398430d097', 'timestamp': 1532402927604844, 'fileformat': 'jpg', 'is_key_frame': True, 'height': 900, 'width': 1600, 'filename': 'samples/CAM_FRONT_LEFT/n015-2018-07-24-11-22-45+0800__CAM_FRONT_LEFT__1532402927604844.jpg', 'prev': '', 'next': '48f7a2e756264647bcf8870b02bf711f', 'sensor_modality': 'camera', 'channel': 'CAM_FRONT_LEFT'}

# camera_calibrated_data
{
    
    'token': '75ad8e2a8a3f4594a13db2398430d097', 'sensor_token': 'ec4b5d41840a509984f7ec36419d4c09', 'translation': [1.52387798135, 0.494631336551, 1.50932822144], 'rotation': [0.6757265034669446, -0.6736266522251881, 0.21214015046209478, -0.21122827103904068], 'camera_intrinsic': [[1272.5979470598488, 0.0, 826.6154927353808], [0.0, 1272.5979470598488, 479.75165386361925], [0.0, 0.0, 1.0]]}

The drawn image is as follows:

insert image description here

Figure 2-3 CAM_FRONT_LEFT point cloud projection results

Of course, we can also draw the information of the 3D frame, the code is as follows:

from nuscenes.utils.data_classes import Box
for token in sample["anns"]:
    annotation = nuscenes.get("sample_annotation", token)
    # print(annotation)
    box = Box(annotation['translation'], annotation['size'], Quaternion(annotation['rotation']))
    # print(box)
    # 坐标变换
    corners = box.corners().T # 3x8 => 立体框角点
    global_corners = np.concatenate([corners, np.ones((len(corners), 1))], axis=1)
    image_based_corners = global_corners @ global_to_image.T
    image_based_corners[:, :2] /= image_based_corners[:, [2]] # 归一化
    image_based_corners = image_based_corners.astype(np.int32)

    # 长方体12条棱
    ix, iy = [0, 1, 2, 3, 0, 1, 2, 3, 4, 5, 6, 7], [4, 5, 6, 7, 1, 2, 3, 0, 5, 6, 7, 4]
    for p0, p1 in zip(image_based_corners[ix], image_based_corners[iy]):
        if p0[2] <= 0 or p1[2] <= 0: continue # 过滤z<0
            cv2.line(image, (p0[0], p0[1]), (p1[0], p1[1]), (0, 255, 0), 2, 16)

insert image description here

Figure 2-4 CAM_FRONT_LEFT point cloud projection + 3D frame result

2.5 Complete sample code

The complete sample code is as follows:

# pip install nuscenes-devkit
from nuscenes.nuscenes import NuScenes
from pyquaternion import Quaternion
import numpy as np
import cv2
import os

# 构建nuScenes类
version = "v1.0-mini"
dataroot = "D:\\Data\\nuScenes"
nuscenes = NuScenes(version, dataroot, verbose=False)

sample = nuscenes.sample[0]

# 获取lidar的数据
lidar_token = sample["data"]["LIDAR_TOP"]
lidar_sample_data = nuscenes.get('sample_data', lidar_token)
lidar_file = os.path.join(dataroot, lidar_sample_data['filename'])

# 加载点云数据
lidar_points = np.fromfile(lidar_file, dtype=np.float32).reshape(-1, 5)

# lidar坐标变换
# 获取lidar相对于车体坐标系而言的translation和rotation标定数据
lidar_calibrated_data = nuscenes.get("calibrated_sensor", lidar_sample_data["calibrated_sensor_token"])
# print(lidar_calibrated_data)

def get_matrix(calibrated_data, inverse=False):
    # 返回的结果是 lidar->ego 坐标系的变换矩阵
    output = np.eye(4)
    output[:3, :3] = Quaternion(calibrated_data["rotation"]).rotation_matrix
    output[:3,  3] = calibrated_data["translation"]
    if inverse:
        output = np.linalg.inv(output)
    return output

# lidar->ego 变换矩阵
lidar_to_ego = get_matrix(lidar_calibrated_data)
# print(lidar_to_ego)

# 获取lidar数据时对应的车体位姿
ego_pose = nuscenes.get("ego_pose", lidar_sample_data["ego_pose_token"])
# print(ego_pose)
# ego->global 变换矩阵
ego_to_global = get_matrix(ego_pose)
# print(ego_to_global)
# lidar->global 变换矩阵
lidar_to_global = ego_to_global @ lidar_to_ego
# print(lidar_to_global)

# x, y, z -> x, y, z, 1 转换为齐次坐标,方便做矩阵相乘
hom_points = np.concatenate([lidar_points[:, :3], np.ones((len(lidar_points), 1))], axis=1)
# lidar_points -> global
global_points = hom_points @ lidar_to_global.T
# print(global_points)

# camera坐标变换
cameras = ['CAM_FRONT_LEFT', 'CAM_FRONT', 'CAM_FRONT_RIGHT', 'CAM_BACK_LEFT', 'CAM_BACK', 'CAM_BACK_RIGHT']

for cam in cameras:
    # 获取camera的数据
    camera_token = sample["data"][cam]
    camera_data  = nuscenes.get("sample_data", camera_token)
    # print(camera_data)
    # 加载图像
    image_file = os.path.join(dataroot, camera_data["filename"])
    image = cv2.imread(image_file)
    
    # 获取camera数据时对应的车体位姿
    camera_ego_pose = nuscenes.get("ego_pose", camera_data["ego_pose_token"])
    # global->ego 变换矩阵
    global_to_ego = get_matrix(camera_ego_pose, True)

    # 获取camera相对于车体坐标系而言的translation、rotation和camera_intrinsic标定数据
    camera_calibrated_data = nuscenes.get("calibrated_sensor", camera_data["calibrated_sensor_token"])
    # print(camera_calibrated_data)
    # ego->camera 变换矩阵
    ego_to_camera = get_matrix(camera_calibrated_data, True)
    camera_intrinsic = np.eye(4)
    camera_intrinsic[:3, :3] = camera_calibrated_data["camera_intrinsic"]

    # global->camera_ego_pose->camera->image
    global_to_image = camera_intrinsic @ ego_to_camera @ global_to_ego

    image_points = global_points @ global_to_image.T
    image_points[:, :2] /= image_points[:, [2]] # 归一化
    
    # 过滤z<=0的点,因为它在图像平面的后面,形不成投影
    # z > 0
    for x, y in image_points[image_points[:, 2] > 0, :2].astype(int):
        cv2.circle(image, (x, y), 3, (255, 0, 0), -1, 16)

    from nuscenes.utils.data_classes import Box
    for token in sample["anns"]:
        annotation = nuscenes.get("sample_annotation", token)
        # print(annotation)
        box = Box(annotation['translation'], annotation['size'], Quaternion(annotation['rotation']))
        # print(box)
        # 坐标变换
        corners = box.corners().T # 3x8 => 立体框角点
        global_corners = np.concatenate([corners, np.ones((len(corners), 1))], axis=1)
        image_based_corners = global_corners @ global_to_image.T
        image_based_corners[:, :2] /= image_based_corners[:, [2]] # 归一化
        image_based_corners = image_based_corners.astype(np.int32)

        # 长方体12条棱
        ix, iy = [0, 1, 2, 3, 0, 1, 2, 3, 4, 5, 6, 7], [4, 5, 6, 7, 1, 2, 3, 0, 5, 6, 7, 4]
        for p0, p1 in zip(image_based_corners[ix], image_based_corners[iy]):
            if p0[2] <= 0 or p1[2] <= 0: continue # 过滤z<0
            cv2.line(image, (p0[0], p0[1]), (p1[0], p1[1]), (0, 255, 0), 2, 16)

    cv2.imwrite(f"{
      
      cam}.jpg", image)

3. Need additional knowledge

We need to add some additional knowledge to study this course

Summarize

This course learned to visualize the nuScenes dataset and project the lidar point cloud data onto the camera image. In the process of implementation, we need to understand coordinate transformation, quaternion and other related knowledge. Generally speaking, the visual code is easy to understand, but some related basic knowledge needs to be understood by ourselves.

Guess you like

Origin blog.csdn.net/qq_40672115/article/details/131166571