Introduction of kitti dataset in 3D target detection (2) Detailed visualization

This article has participated in the "Newcomer Creation Ceremony" event to start the road of gold creation together.

It is recommended to read the first blog : https://juejin.cn/post/7083421350685048846 Because the usage of the kitti dataset was not very clear when writing the first article, here is based on the visualization code, and then re-summarize its main Process awareness:

1: The point cloud data bin is projected onto the image

A bin file corresponds to a point cloud in a 360° direction: insert image description hereyou can see that the black position in the middle is the position of the car. With the 3D coordinates of the center point, a bin file stores the coordinates of multiple points, (x, y, z, r), r represents the intensity of reflection, which is generally not used here.

pointcloud = np.fromfile(str("G:/3Ddet/Point-GNN/000000.bin"), dtype=np.float32, count=-1).reshape([-1, 4])

print(pointcloud.shape)
x = pointcloud[:, 0]  # x position of point
y = pointcloud[:, 1]  # y position of point
z = pointcloud[:, 2]  # z position of point
r = pointcloud[:, 3]  # reflectance value of point
d = np.sqrt(x ** 2 + y ** 2)  # Map Distance from sensor
复制代码

The above code can be used to get the x, y, and z values ​​of each point. There are about 100,000 points in a bin file. Figure 1(1) To convert the coordinate system under the point cloud to the coordinate system under the camera No. 0, the external parameter matrix needs to be used. Its process:

  • 1.1: Assuming n points, the matrix nx3 is obtained:

insert image description here

  • 1.2: Then the homogeneous matrix is ​​expanded to nx4 size with 1:

insert image description here

  • 1.3: Then dot-multiply the external parameter matrix np.dot(pts_3d_velo, np.transpose(self.V2C)), where V2C is a 3*4 column, and the format is:

insert image description hereThe rotation matrix is ​​the transformation of the coordinate system, and the translation matrix is ​​the movement of the coordinate origin. From the given annotation file, it can be seen that the matrix corresponding to the bin file is different. Here I think it is because the vibration of the daily shooting will cause a slight difference, but The 0.27m in translation (z-axis direction, see the car diagram above) is roughly constant. Get the result nx3, that is, the coordinates of each point in the camera coordinate system:insert image description here

  • 1.4: Then use the correction matrix of camera 0 to correct:

R0修正矩阵3x3:insert image description here np.transpose(np.dot(self.R0, np.transpose(pts_3d_ref))),得到nx3: insert image description here 这里为什么修正?主要是随着时间的偏移,0号相机的坐标可能会有偏差,将其校准,然后后面的投影即可通过内参矩阵和位移完成。

  • 1.5:之后齐次矩阵填充1得到nx4,为了后面的运算:

insert image description here

  • 1.6:pts_2d = np.dot(pts_3d_rect, np.transpose(self.P)) 进行内参矩阵的点乘,然后投影到图像上:

insert image description here 内参矩阵只和相机的内部参数有关:焦距f和光心位置c,每个相机的焦距是一样的。 这里首先需要注意的是:图像的像素坐标系原点在左上角,而上面公式假定原点在图像中心,为了处理这一偏移,设光心在图像上对应的像素坐标为(cu,cv),所以需要填上cu和cv---cx,cy。 insert image description here 其中的45.7=0.06f=0.06707=45,上面的最后一列是相机坐标系的偏移,主要是x轴**,y和z轴稍微有点误差**。 得到: insert image description here 因为内参矩阵的类似于: insert image description here 所以要得到图片上的x,y需要pts_2d[:, 0] /= pts_2d[:, 2]; pts_2d[:, 1] /= pts_2d[:, 2],最后结果为: insert image description here 最后每个点转换的坐标就完成了。 这里需要注意的几点是:在修正矩阵的时候以及在内参矩阵相乘的时候的有一些小的误差值的相乘,可能主要是由于摄像机的位置变化导致,主要计算细节处,涉及到了内参矩阵和外参矩阵以及图像校正的问题,可以着重看看这个,这里对于其中的细节处不是太深的了解,因为主要参数都已经提供了,所以按其公式直接用即可,而图片主要是为了显示方便而已,一般只利用到点云的话,基本不用考虑这个。

二:将0号照相机3D标注框转换到点云下的坐标系。

0号相机下坐标系的标注:x,y,z,l,w,h,主要提取8个顶点,然后分别转换到点云坐标系下,即作为点云的gt。 其主要步骤为:

  • 2.1: R = roty(obj.ry),得到标注文件的第15个参数,角度ry

3D物体的空间方向(rotation_y)取值范围为:-pi ~ pi(单位:rad),它表示,在照相机坐标系下,物体的全局方向角(物体前进方向与相机坐标系x轴的夹角) insert image description here R设置为: [ [cost 0 sint] [0 1 0] [sint 0 -cost] ], 其中我看可视化代码写的是 [ [cost 0 sint] [0 1 0] [-sint 0 cost] ], 但是推导后感觉其是错误的,但是现实的结果竟然是一样的,所以需要待考证。 insert image description here

  • 2.2通过角度得到8个点的坐标:
	R = roty(obj.ry)
	# 3d bounding box dimensions
    l = obj.l
    w = obj.w
    h = obj.h

    # 3d bounding box corners
    x_corners = [l / 2, l / 2, -l / 2, -l / 2, l / 2, l / 2, -l / 2, -l / 2]
    y_corners = [0, 0, 0, 0, -h, -h, -h, -h]
    z_corners = [w / 2, -w / 2, -w / 2, w / 2, w / 2, -w / 2, -w / 2, w / 2]

    # rotate and translate 3d bounding box
    corners_3d = np.dot(R, np.vstack([x_corners, y_corners, z_corners]))
    # print corners_3d.shape
    #得到8个点的坐标(x,y,z) 位置是底部(右前,右后,左前,左后) 顶部----
    corners_3d[0, :] = corners_3d[0, :] + obj.t[0]
    corners_3d[1, :] = corners_3d[1, :] + obj.t[1]
    corners_3d[2, :] = corners_3d[2, :] + obj.t[2]
复制代码

结果为: #得到8个点的坐标(x,y,z) 位置是底部(右前,右后,左前,左后) 顶部---- insert image description here 其中如何推导来的,主要是利用了相似三角形,可以自己在纸上画一画,不需要考虑此时的y轴,这里假设地面已经是平的了,然后从这里可以看到,(x,y,z)是底部中心点的坐标。

  • 2.3:反矫正,因为(x,y,z)是在矫正后的坐标,所以需要反矫正:

np.transpose(np.dot(np.linalg.inv(self.R0), np.transpose(pts_3d_rect))) R0见上面,得到: insert image description here

  • 2.4:齐次式填充1

insert image description here

  • 2.5:利用C2V矩阵投影到点云坐标系,np.dot(pts_3d_ref, np.transpose(self.C2V)),其中C2V是由V2C转换而来:

insert image description here

  • 2.6:后面就是画图了

Note: The above explanation is written for github.com/kuixu/kitti... code.

Guess you like

Origin juejin.im/post/7084824859674574856