Monocular ranging (yolo target detection + calibration + ranging code)

Monocular ranging (target detection + calibration + ranging) **

   实时感知本车周围物体的距离对高级驾驶辅助系统具有重要意义,当判定物体与本车距离小于安全距离时便采取主动刹车等安全辅助功能,这将进一步提升汽车的安全性能并减少碰撞的发生。上一章本文完成了目标检测任务,接下来需要对检测出来的物体进行距离测量。首先描述并分析了相机成像模型,推导了图像的像素坐标系与世界坐标系之间的关系。其次,利用软件标定来获取相机内外参数并改进了测距目标点的选取。最后利用测距模型完成距离的测量并对采集到的图像进行仿真分析和方法验证。

5.1 Comparison between monocular vision distance measurement and binocular vision distance measurement**

测距在智能驾驶的应用中发挥着重要作用。测距方法主要包含两类:主动测距与被动测距,主动测距是当前研究的热点内容之一。主动测距方法包括采用传感器、摄像机、激光雷达等车载设备进行测距。摄像头由于价格相对低廉且性能稳定应用较为广泛,本文采用摄像头进行距离测量。

Monocular ranging mainly uses the ranging model combined with the target rectangular frame to perform ranging tasks, and estimates the distance through the size and position information of the target in the image. The monocular distance measurement algorithm has the advantages of small amount of calculation and low cost, and the distance measurement error can also be eliminated through subsequent adjustments. Many algorithms are developing products based on monocular vision sensors. Therefore, compared with other ranging methods, monocular vision has a more mature algorithm, and this paper also uses monocular vision ranging.
Using binocular vision, the pixel offset of the same object on the imaging plane can be obtained. The distance between objects can then be derived mathematically using the camera focal length, pixel offset, and the actual distance between the two cameras. Compared with monocular distance measurement, binocular distance measurement is more accurate and does not require data sets, but it is computationally intensive and relatively slow, and the cost becomes higher due to the use of two cameras.

5.2 Camera Imaging Model**

To obtain distance information, you need to obtain points in the three-dimensional real world, and since the object to be processed is a two-dimensional plane image captured by a camera, how to convert a point on a two-dimensional image into a point in the three-dimensional world is worthwhile. considerations. Furthermore, converting points on the image to points in the real world requires mutual conversion between the pixel coordinate system, the image coordinate system, the camera coordinate system, and the world coordinate system. The relationship between the four coordinate systems is shown in Figure 5-1. The coordinate system is described as follows:
Figure 5-1 Coordinate relationship diagram********

(1) Pixel coordinate system. A digital image is generally a three-dimensional image and is composed of many pixels. The origin of the pixel coordinate system is O2, the width direction is the u axis, and the height direction is the v axis.
(2) Image coordinate system. The origin of the image coordinates is O1, and the pixel coordinate system and the image coordinate system are parallel, with the image width direction as the x-axis, and the height direction as the y-axis, and the length unit is mm.
(3) Camera coordinate system. The origin Oc of the camera coordinate system, the Xc axis, and the Yc axis are respectively parallel to the x axis and the y axis in the image coordinate system, and the Zc axis of the camera coincides with the optical axis of the camera.
(4) World coordinate system. Our environment is under the world coordinate system, which is the Xw-Yw-Zw plane in Figure 5-1. Pw completes the transformation from world coordinates to coordinates on the image from a point in the real world to point P on the image.

5.3 Coordinate system conversion

(1) Convert pixel coordinate system to image coordinate system

The pixel coordinate system uses pixels to represent the position information of each pixel, but it cannot express the physical size of the object in the image, so conversion between coordinate systems is required.

insert image description here

                图5-2 图像坐标系

In Figure 5-2, the relationship between the coordinates (x, y) of the image coordinate system and the coordinates (u, v) of the pixel coordinate system can be expressed as:
insert image description here

(5.1)

In formula (5.1), (u0, v0) are the pixel coordinates of the center of the image, and dx and dy are the unit physical lengths of horizontal and vertical pixels on the photosensitive plate, respectively.
Written in the form of a homogeneous coordinate matrix is:
insert image description here

	(5.2)

# (2) Transform the image coordinate system to the camera coordinate system

insert image description here

                       **图5-3 相机坐标系**

In Figure 5-3, the distance between OcO1 is the focal length f. Figure 5-4 shows the process of imaging an object into the image coordinate system. Points P and P' are the coordinates in the camera coordinate system and the image coordinate system respectively.

insert image description here

Figure 5-4 Similar triangle model
It is easy to know from the above figure that the triangle OcO1B is similar to the triangle OcCA, and the triangle OcBP' is similar to the triangle OcAP. According to the principle of similar triangles:
insert image description here

	(5.3)

And the distance of OcO1 is the focal length f, combined with the coordinates of P(Xc,Yc,Zc), P'(x,y), the above formula can be written as:
insert image description here

	(5.4)

Further deduction can be obtained:
insert image description here

It can be written in the form of a homogeneous coordinate matrix as:
insert image description here

	(5.5)

(3) Transform the camera coordinate system to the world coordinate system
insert image description here

                **图5-5相机坐标系到世界坐标系的转换**

The transformation of the camera coordinate system to the world coordinate system can be described as a process of rotation and translation. Adding the components of rotation and translation respectively is the whole process of the transformation of the entire coordinate system. For the rotation process, assuming that the Z axis of the camera coordinate system coincides with the world coordinate system, then:
insert image description here

	(5.6)

Similarly, rotating around the X axis will result in the following relationship:
insert image description here

	(5.7)

Rotating around the Y axis results in the following relationship:
insert image description here

	(5.8)

For the translation component, it can be expressed as:
insert image description here

	(5.9)

After obtaining the translation vector and rotation matrix, the formula for transforming from the camera coordinate system to the world coordinate system can be completely written as:
insert image description here

	(5.10)

where the rotation matrix R is:
insert image description here

, the translation matrix T is expressed as: . Combining formula (5.2), formula (5.5) and formula (5.10), the transformation from the pixel coordinate system to the world coordinate system is completed, and the integration is:
insert image description here

	(5.11)

In this way, for a point on the image, the specific distance value can be obtained by using the above formula combined with the internal and external parameters of the camera.

5.4 Camera internal and external parameters and distortion coefficient

Camera extrinsic and extrinsic parameters are of great significance to image rectification and ranging. The internal and external parameters and distortion coefficients can be obtained through camera calibration. And the camera will be distorted, which is obviously intolerable for measurement, so it is very necessary to eliminate the distortion

5.4.1 Camera internal and external parameters

In the formula (5.11), the first item on the right side of the equal sign is the extrinsic parameter of the camera, and the second item on the right side of the equal sign is the internal parameter of the camera, and the internal parameter belongs to the intrinsic property of the camera. The unknown parameter Zc in the formula represents the distance from the object to the optical center. At the same time, it also shows that in the process of camera calibration, if the position of the object relative to the camera is different, then it is necessary to perform camera calibration for each position.

5.4.2 Camera distortion coefficient

Camera distortion causes the phenomenon that the real position of a certain pixel on the image plane does not completely coincide with the ideal position, so it is necessary to understand the phenomenon of camera distortion and correct the camera. Camera distortion mainly includes radial distortion and tangential distortion. Figure 5-6 shows the distortion model. For any theoretical point P on the plane, due to the existence of distortion, point P will be shifted to point P'. In the figure, dr represents the radial distortion of the camera, and dt represents the tangential distortion of the camera.
insert image description here

    图5-6 相机畸变模型

(1) Radial distortion
Radial distortion is closely related to the shape of the lens, and the farther away from the center of the lens, the more obvious the distortion is. Radial distortion is mostly manifested in two ways: barrel distortion and pincushion distortion. The polynomial adjustment formula for distortion is:
insert image description here

	(5.12)

where (xi, yi) is an ideal point on the image plane. (x0, y0) is the actual position point after distortion correction. The distance between the ideal point and the imaging center, k1, k2, k3 is expressed as the camera radial distortion coefficient. If all three coefficients are less than zero, barrel distortion will result; if all three coefficients are greater than zero, pincushion distortion will result.
(2) Tangential distortion
Most of the tangential distortion is caused by the assembly and manufacturing process of the entire camera. For tangential distortion, two tangential distortion coefficients p1 and p2 can be used for correction.
After obtaining the correction formula above, a good correction to the image can be completed. After combining the tangential correction function and the radial correction function, it can be expressed by the following formula:
insert image description here

	(5.14)

Since the distance measurement in this paper is aimed at the field of automatic driving, it requires high precision, and distortion will have a profound negative impact on the performance of intelligent driving. Therefore, in these applications, the system must use the camera calibration method to obtain p1 and p2 as well as k1, k2 and k3, and these parameters will also be applied to the subsequent ranging model.

5.5 Camera Calibration Process

Camera calibration can obtain distortion parameters and internal and external parameters of the camera. Camera calibration can be divided into two types: the calibration method relying on the calibration reference object and the camera self-calibration method. The former is suitable for applications requiring high precision. The latter is not suitable for application scenarios that require high accuracy due to large errors in the calculation results after calibration. In order to pursue calculation accuracy, this paper adopts the first method for calibration.

Use matlab for calibration: insert image description here
the above picture is a reference picture.
By collecting the picture of the calibration plate, and then feeding it into matlab to obtain the internal reference of the camera. External reference.
insert image description here

5.6 Monocular ranging model

After completing the camera distortion correction and the calculation of the internal and external parameters of the camera, the following monocular distance measurement model is established, and the distance can be calculated by combining the rectangular frame obtained by the target detection in Chapter 4.
insert image description here

   图5-10 测距模型

The ranging model can be regarded as a convex lens imaging process. In the figure above, Xc-Yc-Zc is the camera coordinate system, xO1y is the image coordinate system, O1O is the focal length f, x1O2y1 is the ground coordinate system, and OO2 is the camera installation height h. In the figure, there is a car on the ground, then its ground point Q must be on the ground. In the process of monocular ranging, point Q on the actual object corresponds to point Q' on the imaged picture, and the projection of point Q' on the y-axis is point P'. The included angle between the horizontal line and the Zc axis is α, the included angle between the Zc optical axis and PP' is β, and the included angle between the straight line OP and the ground x1 axis is γ.

5.6.1 Selection of target points

According to the operation results in Chapter 4, the target detection frame in Figure 5-11 will be obtained, and the internal and external parameters of the camera are known, and the distance measurement value can be obtained by combining them. Specifically, this paper first selects the reference point (target point), intends to select the midpoint of the bottom of the target frame as the reference point, and obtains results based on a large number of target frames. It is observed that the target rectangular frame is slightly larger than the actual size of the target object, so an offset method is adopted to correct the target reference point to ensure the ranging accuracy. In this paper, the reference point is shifted up by d pixels, and the coordinates of the upper left and lower right corners of the target frame are obtained, so the coordinates of the reference point can be expressed as:
insert image description here

	(5.15)

Among them, xL, xR, and yR represent the x coordinates of the upper left corner, the x coordinates of the lower right corner, and the y coordinates of the lower right corner of the red box.
insert image description here

 图5-11 目标点的选取

However, the above target points are suitable for the scene where the front object is directly in front of the vehicle. When facing the scene as shown in Figure 5-12, the target object will appear at the side of the vehicle. If the middle point of the lower part of the target frame is used as the ranging target point, there will be a problem that the target point seriously deviates from the directly below the vehicle, and there is a phenomenon that the target point appears on the left or right side of the center of the car, which will result in poor ranging accuracy. High disadvantage. Therefore, to further improve it, when the slope k between the target point (xp, yp) and the lower midpoint of the image meets the threshold δ, the value of xp' will be updated, and the new xp' can be expressed as:
insert image description here

	(5.16)

Where λ is the offset weight coefficient, when the value of k is negative, λ is negative; when the value of k is positive, λ is also positive

measurement result

insert image description here
Measured 6.01
Measured 6.01 meters

the code


        for path, img, im0s, vid_cap in dataset:
        img = torch.from_numpy(img).to(device)
        img = img.half() if half else img.float()  # uint8 to fp16/32
        img /= 255.0  # 0 - 255 to 0.0 - 1.0
        if img.ndimension() == 3:
            img = img.unsqueeze(0)

        # Warmup
        if device.type != 'cpu' and (old_img_b != img.shape[0] or old_img_h != img.shape[2] or old_img_w != img.shape[3]):
            old_img_b = img.shape[0]
            old_img_h = img.shape[2]
            old_img_w = img.shape[3]
            for i in range(3):
                model(img, augment=opt.augment)[0]

        # Inference
        t1 = time_synchronized()
        with torch.no_grad():   # Calculating gradients would cause a GPU memory leak
            pred = model(img, augment=opt.augment)[0]
        t2 = time_synchronized()
         distance=object_point_world_position(u, v, h, w, out_mat, in_mat):

Code acquisition: please private message

Guess you like

Origin blog.csdn.net/ALiLiLiYa/article/details/128323184