From binocular targeting to stereo matching: a practical guide to pyton

foreword

Stereo matching is an important field in computer vision that aims to match images taken from different angles to create 3D effects similar to human vision. The process of achieving stereo matching needs to involve many steps, including binocular positioning, stereo correction, disparity calculation, etc. In this article, I will introduce the basic steps and techniques of how to use Python to achieve stereo matching.

The following code realizes the complete process from camera calibration to stereo matching. The parameters and output of each function will be introduced below.

calibration

First, the program requires the following libraries:

numpy
cv2 (OpenCV)
os

At the beginning of the program, some variables need to be defined to store the path of the calibration image, checkerboard parameters, corner coordinates, etc. The details are as follows:

path_left = "./data/left/"
path_right = "./data/right/"

path_left and path_right are the paths of the left and right camera calibration image folders.

CHESSBOARD_SIZE = (8, 11)
CHESSBOARD_SQUARE_SIZE = 15  # mm

CHESSBOARD_SIZE is the number of rows and columns of corner points inside the checkerboard, and CHESSBOARD_SQUARE_SIZE is the size of each small square inside the checkerboard (in millimeters).

objp = np.zeros((CHESSBOARD_SIZE[0] * CHESSBOARD_SIZE[1], 3), np.float32)
objp[:, :2] = np.mgrid[0:CHESSBOARD_SIZE[0], 0:CHESSBOARD_SIZE[1]].T.reshape(-1, 2) * CHESSBOARD_SQUARE_SIZE

objp is the three-dimensional coordinates of each corner point in the physical coordinate system, that is, the position of the checkerboard. This variable will be used in subsequent camera calibration and stereo matching.

criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 30, 0.001)

Criteria is the termination criterion for corner detection, and this default value is generally used.

img_list_left = sorted(os.listdir(path_left))
img_list_right = sorted(os.listdir(path_right))

img_list_left and img_list_right are the file name lists of the left and right images, respectively, and are obtained using the os.listdir() function.

obj_points = []
img_points_left = []
img_points_right = []

obj_points, img_points_left, and img_points_right store the corner coordinates, the pixel coordinates of the left camera, and the pixel coordinates of the right camera in the physical coordinate system corresponding to each calibration image, respectively. These variables are also used in subsequent camera calibration and stereo matching.

Next, the program reads the calibration image and detects the corners. For each picture, the program does the following:

img_l = cv2.imread(path_left + img_list_left[i])
img_r = cv2.imread(path_right + img_list_right[i])
gray_l = cv2.cvtColor(img_l, cv2.COLOR_BGR2GRAY)
gray_r = cv2.cvtColor(img_r, cv2.COLOR_BGR2GRAY)

First read the left and right images, then convert them to grayscale images.

ret_l, corners_l = cv2.findChessboardCorners(gray_l, CHESSBOARD_SIZE, None)
ret_r, corners_r = cv2.findChessboardCorners(gray_r, CHESSBOARD_SIZE, None)

The checkerboard corners on the left and right images are detected by the cv2.findChessboardCorners() function of OpenCV. The parameters of this function include:

image: The grayscale image that needs to detect corners.
patternSize: The number of rows and columns of internal corner points, that is (CHESSBOARD_SIZE[1]-1, CHESSBOARD_SIZE[0]-1).
corners: an array used to store the detected corner coordinates. If the test fails, this parameter is empty (None).
flags: Optional flags to use when detecting.
The return value of this function includes:

ret: A boolean value indicating whether the detection was successful or not. True if the detection was successful, otherwise False.
corners: an array used to store the detected corner coordinates.
Next is corner detection at the sub-pixel level.

cv2.cornerSubPix(gray_l, corners_l, (11, 11), (-1, -1), criteria)
cv2.cornerSubPix(gray_r, corners_r, (11, 11), (-1, -1), criteria)

Here, OpenCV's cv2.cornerSubPix() function is used for sub-pixel corner detection. The parameters of this function include:

image: The input grayscale image.
corners: an array used to store the detected corner coordinates.
winSize: The size of the search window in each iteration, that is, the size of the search range around each pixel.
Usually 11x11.
zeroZone: The size of the dead zone, indicating how symmetry (if any) is not considered. Usually (-1,-1).
Criteria: Defines the error range, number of iterations, and other criteria for iteration stop, the same as the above criteria.

img_points_left.append(corners_l)
img_points_right.append(corners_r)

If the corner points on the left and right images are detected, the coordinates of these corner points are stored in img_points_left and img_points_right.

cv2.drawChessboardCorners(img_l, CHESSBOARD_SIZE, corners_l, ret_l)
cv2.imshow("Chessboard Corners - Left", cv2.resize(img_l,(img_l.shape[1]//2,img_l.shape[0]//2)))
cv2.waitKey(50)

cv2.drawChessboardCorners(img_r, CHESSBOARD_SIZE, corners_r, ret_r)
cv2.imshow("Chessboard Corners - Right", cv2.resize(img_r,(img_r.shape[1]//2,img_r.shape[0]//2)))
cv2.waitKey(50)


Mark the detected corners on the picture and display them in the window. The cv2.drawChessboardCorners() function is used here. The parameters of this function include:

img: The image whose corner points need to be calibrated.
patternSize: The number of rows and columns of internal corner points, that is (CHESSBOARD_SIZE[1]-1, CHESSBOARD_SIZE[0]-1).
corners: An array storing the coordinates of the detected corners.
patternfound: The mark of the corner point is detected, that is, ret.

The program next calibrates the binocular camera.

ret_l, mtx_l, dist_l, rvecs_l, tvecs_l = cv2.calibrateCamera(obj_points, img_points_left, gray_l.shape[::-1],None,None)
ret_r, mtx_r, dist_r, rvecs_r, tvecs_r = cv2.calibrateCamera(obj_points, img_points_right, gray_r.shape[::-1],None,None)

flags = 0
flags |= cv2.CALIB_FIX_INTRINSIC
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 30, 0.001)
ret, M1, d1, M2, d2, R, T, E, F = cv2.stereoCalibrate(
    obj_points, img_points_left, img_points_right,
    mtx_l, dist_l, mtx_r, dist_r,
    gray_l.shape[::-1], criteria=criteria, flags=flags)

This code first calibrates the left and right cameras separately:

ret_l, mtx_l, dist_l, rvecs_l, tvecs_l = cv2.calibrateCamera(obj_points, img_points_left, gray_l.shape[::-1],None,None)
ret_r, mtx_r, dist_r, rvecs_r, tvecs_r = cv2.calibrateCamera(obj_points, img_points_right, gray_r.shape[::-1],None,None)

Here, OpenCV's cv2.calibrateCamera() function is used to calibrate the left and right cameras. The parameters of this function include:

objectPoints: The corner coordinates in the physical coordinate system corresponding to each calibration image.
imagePoints: Pixel coordinates detected on each calibration image.
imageSize: The size of the calibrated image.
cameraMatrix: The internal parameter matrix used to store the calibration results.
distCoeffs: Distortion coefficients used to store calibration results.
rvecs: The rotation vector in the extrinsic parameter matrix of each calibration image.
tvecs: The translation vector in the extrinsic parameter matrix of each calibration image.
The return value of this function includes:

ret: a flag indicating whether the calibration is successful.
cameraMatrix: The internal parameter matrix used to store the calibration results.
distCoeffs: Distortion coefficients used to store calibration results.
rvecs: The rotation vector in the extrinsic parameter matrix of each calibration image.
tvecs: The translation vector in the extrinsic parameter matrix of each calibration image.
Then calibrate the binocular camera:

flags = 0
flags |= cv2.CALIB_FIX_INTRINSIC
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 30, 0.001)
ret, M1, d1, M2, d2, R, T, E, F = cv2.stereoCalibrate(
    obj_points, img_points_left, img_points_right,
    mtx_l, dist_l, mtx_r, dist_r,
    gray_l.shape[::-1], criteria=criteria, flags=flags)

Here, OpenCV's cv2.stereoCalibrate() function is used for binocular camera calibration. The parameters of this function include:

objectPoints: The corner coordinates in the physical coordinate system corresponding to each calibration image.
imagePoints1: Pixel coordinates detected on the left camera of each calibration image.
imagePoints2: Pixel coordinates detected on the right camera of each calibration image.
cameraMatrix1: The internal parameter matrix of the left camera.
distCoeffs1: Distortion coefficients of the left camera.
cameraMatrix2: The internal parameter matrix of the right camera.
distCoeffs2: The distortion coefficient of the right camera.
imageSize: The size of the calibrated image.
Criteria: Defines the error range and number of iterations to stop iterations.
flags: Optional flags for calibration.
The return value of this function includes:

ret: A flag indicating whether the calibration was successful.
cameraMatrix1: The internal parameter matrix of the left camera.
distCoeffs1: Distortion coefficients of the left camera.
cameraMatrix2: The internal parameter matrix of the right camera.
distCoeffs2: The distortion coefficient of the right camera.
R: rotation matrix.
T: translation vector.
E: Essence matrix.
F: Fundamental matrix.

stereo matching

The whole process of stereo matching with the parameters obtained through image calibration is as follows:

First, we need to read the left and right images:

img_left = cv2.imread("./left.png")
img_right = cv2.imread("./right.png")

Among them, "./left.png" and "./right.png" are paths to place the left and right images. These two images are uncorrected and rectified images.

Next, the parameters of the camera are obtained through image calibration, and the image is de-distorted according to the obtained parameters:

img_left_undistort = cv2.undistort(img_left, M1, d1)
img_right_undistort = cv2.undistort(img_right, M2, d2)

In the above code, M1, M2, d1, d2 are parameters obtained from binocular camera calibration. The undistorted images img_left_undistort and img_right_undistort are available for subsequent operations.

Then, epipolar correction is performed to achieve geometric consistency between the left and right images:

R1, R2, P1, P2, Q, roi1, roi2 = cv2.stereoRectify(M1, d1, M2, d2, (width, height), R, T, alpha=1)
map1x, map1y = cv2.initUndistortRectifyMap(M1, d1, R1, P1, (width, height), cv2.CV_32FC1)
map2x, map2y = cv2.initUndistortRectifyMap(M2, d2, R2, P2, (width, height), cv2.CV_32FC1)
img_left_rectified = cv2.remap(img_left_undistort, map1x, map1y, cv2.INTER_LINEAR)
img_right_rectified = cv2.remap(img_right_undistort, map2x, map2y, cv2.INTER_LINEAR)

Among them, R and T are the rotation and translation matrices obtained by binocular camera calibration, and (width, height) are the dimensions of the left and right images. R1 and R2 are the rotation matrices of the left and right images, P1 and P2 are the projection matrices of the left and right images, Q is the parallax transformation matrix, and roi1 and roi2 are the areas that can be used in the corrected image.

Then, stitch the left and right images together for easy viewing:

img_stereo = cv2.hconcat([img_left_rectified, img_right_rectified])

Next, the disparity map needs to be calculated:

minDisparity = 0
numDisparities = 256
blockSize = 9
P1 = 1200
P2 = 4800
disp12MaxDiff = 10
preFilterCap = 63
uniquenessRatio = 5
speckleWindowSize = 100
speckleRange = 32
sgbm = cv2.StereoSGBM_create(minDisparity=minDisparity, numDisparities=numDisparities, blockSize=blockSize,
                             P1=P1, P2=P2, disp12MaxDiff=disp12MaxDiff, preFilterCap=preFilterCap,
                             uniquenessRatio=uniquenessRatio, speckleWindowSize=speckleWindowSize,
                             speckleRange=speckleRange, mode=cv2.STEREO_SGBM_MODE_SGBM_3WAY)

disparity = sgbm.compute(img_left_rectified,img_right_rectified)

The above code block defines the parameters of the disparity algorithm used, and uses the SGBM (Semi Global Block Matching) algorithm to calculate the original disparity map. Note that this needs to be divided by 16 since the 16-bit SGBM output is used.

Next, WLS filtering can be performed on the disparity map to reduce the disparity hole:

# 定义 WLS 滤波参数
lambda_val = 4000
sigma_val = 1.5

# 运行 WLS 滤波
wls_filter = cv2.ximgproc.createDisparityWLSFilterGeneric(False)
wls_filter.setLambda(lambda_val)
wls_filter.setSigmaColor(sigma_val)
filtered_disp = wls_filter.filter(disparity, img_left_rectified,  None, img_right_rectified)
filtered_disp_nor = cv2.normalize(filtered_disp, filtered_disp, alpha=0, beta=255, norm_type=cv2.NORM_MINMAX, dtype=cv2.CV_8U)

In the above code block, WLS filtering denoises and smoothes the disparity map. The cv2.ximgproc.createDisparityWLSFilterGeneric function is used here to create an object wls_filter that generates a WLS filter, and then set the filter parameters lambda_val and sigma_val. filtered_disp is the filtered disparity map. filtered_disp_nor is the normalized disparity map for display.

Finally, the original disparity map, the disparity map of the preprocessed WLS filter can be displayed in a window:

cv2.imshow("disparity", cv2.resize(disparity_nor,(disparity_nor.shape[1]//2,disparity_nor.shape[0]//2)))
cv2.imshow("filtered_disparity", cv2.resize(filtered_disp_nor,(filtered_disp_nor.shape[1]//2,filtered_disp_nor.shape[0]//2)))
cv2.waitKey()
cv2.destroyAllWindows()

The article has been updated synchronously in the 3D Vision Workshop. The original link is as follows:

From Binocular Targeting to Stereo Matching: A Practical Guide to Python

Welcome everyone to join the knowledge planet, there are many experts in it to answer questions, and you can also discuss problems with your friends!

Guest Xiao Zhang Tt invites you to join the planet!

Guess you like

Origin blog.csdn.net/weixin_43788282/article/details/131343784