OpenCV Quick Start: Camera Calibration - Monocular Vision and Binocular Vision


Preface

In today's era of increasing technological development, computer vision, as an important branch of artificial intelligence, has penetrated into all areas of our lives. In this vast field, camera calibration is a basic and critical step, which directly affects the accuracy and performance of the vision system. Especially in the applications of monocular vision and binocular vision, accurate camera calibration has become a prerequisite for achieving efficient and accurate visual perception.

Monocular Vision and Binocular Vision are the two most basic forms of vision in computer vision. Monocular vision refers to using only one camera for image capture and processing. It is the most common form of vision and is widely used in various smart devices and monitoring systems. The main challenge of the monocular vision system is that it cannot obtain depth information directly from the image, which requires us to use algorithms and models to infer the three-dimensional structure of the scene. Binocular vision simulates the human binocular vision mechanism and uses two cameras to capture images from slightly different angles, thereby directly calculating the depth information of objects in the image. This method makes the binocular vision system more accurate and efficient when processing three-dimensional spatial information, especially showing great potential in fields such as robot navigation and self-driving cars.

This article will briefly introduce the basic principles and practical methods of monocular vision and binocular vision camera calibration.

OpenCV Logo


1. Basic principles of camera calibration

Camera calibration is an extremely important process in the field of computer vision, which involves understanding and correcting the way a camera captures images. This process is crucial to improve the accuracy and effectiveness of image processing, especially in monocular and binocular vision applications.

1.1 Camera model and coordinate system

Before diving into the practice of camera calibration, it is necessary to understand the principles behind it. The camera model and coordinate system form the theoretical basis of camera calibration.

1.1.1 Camera model

The camera model describes how points in the three-dimensional world are mapped to two-dimensional images. Among them, the most common model is the Pinhole Camera Model.

In the pinhole camera model, a point in three-dimensional space P ( x , y , z ) P(x, y, z) P(x,y,z) is mapped to a point on the two-dimensional image plane through a projection process p ( u , v ) p(u, v) p(u,v). This mapping process usually involves some basic concepts in geometry and linear algebra.

The mapping process can be expressed by the following formula:
u = f x x z + c x u = f_x \frac{x}{z} + c_x in=fxWithx+cx
v = f y y z + c y v = f_y \frac{y}{z} + c_y in=fyWithy+cy

这り, ( f x , f y ) (f_x, f_y) (fx,fy) is the focal length of the camera, the scaling factor on the x-axis and y-axis of the image plane respectively; ( c x , c y ) ( c_x, c_y) (cx,cy) is the principal point of the image plane, usually the center of the image.

In this model, points in three-dimensional space are first projected onto an imaginary image plane through a pinhole (an idealized small hole), and then converted to actual image coordinates based on the intrinsic parameters of the camera (such as focal length and principal point position). Department. Although this process simplifies the complexity of the real world, it provides a powerful basic model for computer vision to understand and process the relationship between the three-dimensional world and two-dimensional images.

1.1.2 Coordinate system

In the field of camera calibration and computer vision, it is important to understand and deal with different coordinate systems, especiallyworld coordinate system and < /span>. Camera coordinate system

  1. World Coordinate System:
  • This is a fixed coordinate system typically used to describe the location of objects in the real world.
  • The world coordinate system is a reference coordinate system that defines the absolute position of objects in space.
  • It is a three-dimensional coordinate system, usually with some convenient point as the origin, such as a corner of the scene or a specific landmark.
  1. Camera Coordinate System:
  • The camera coordinate system is a camera-centered coordinate system used to describe the position of points under the camera's perspective.
  • In this coordinate system, the origin is usually located at the optical center of the camera, the x-axis and y-axis are parallel to the camera's image plane, and the z-axis is along the camera's line of sight.
  • The camera coordinate system is relative to the camera position and orientation, so when the camera moves, the camera coordinate system also moves with it.

1.2 Camera internal and external parameters

In camera calibration, it is crucial to understand and determine the intrinsic parameters (Intrinsic Parameters) and extrinsic parameters (Extrinsic Parameters) of the camera. These two sets of parameters jointly define the complete mapping process from the three-dimensional world coordinate system to the two-dimensional image coordinate system. The accurate acquisition of internal and external parameters is the key to the success of computer vision applications. For example, in augmented reality, in order to correctly superimpose virtual objects on real-world images, it is necessary to know precisely the position and orientation of the camera (extrinsic parameters), and how to correctly map the three-dimensional space onto the two-dimensional image (intrinsic parameters).

1.2.1 Internal parameters

Internal parameters (Intrinsic Parameters) describe the characteristics of the camera itself. These parameters have nothing to do with the position and orientation of the camera. They mainly include:

  1. Focal length: The focal length of a camera lens, usually expressed in pixel units on the image plane. Focal length affects how much an image is magnified.

  2. Principal point coordinates: The coordinates of the imaging center point (usually close to the image center) on the image sensor. This point is the mapping of the "camera center" in the three-dimensional world on the image plane.

  3. Distortion coefficient: Includes radial distortion and tangential distortion parameters, used to describe and correct image distortion caused by imperfect lens design and assembly.

Internal parameters are obtained through the camera calibration process. Once they are determined, they are fixed for the same camera in different situations.

1.2.2 External parameters

Extrinsic Parameters describe the position and attitude of the camera relative to the world coordinate system. These parameters relate to how the camera is placed and pointed. They include:

  1. Rotation matrix: Describes the rotation of the camera from the world coordinate system to the camera coordinate system.

  2. Translation vector: Describes the translation from the origin of the world coordinate system to the origin of the camera coordinate system (i.e., the optical center of the camera).

The external parameters are scene-dependent, that is, each time the camera position or orientation changes, the external parameters also need to be re-determined.

1.3 Lens distortion

Lens distortion is a manifestation of the inability of the camera lens to perfectly map the real world onto the image plane, which results in differences between the geometric shape of the image and the shape of the actual object. In computer vision and camera calibration, dealing with lens distortion is a very important part.

  1. various type

    • Radial Distortion: The most common type of distortion, usually characterized by the edge parts of the image being more distorted than the center part. In radial distortion, straight lines in an image may appear as curves.
    • Tangential Distortion: Due to imperfect parallel alignment between the lens and image sensor, some parts of the image may be slightly offset.
  2. Error proofreading:

    • Distortion parameters can be obtained through camera calibration, and then these parameters are used for distortion correction to restore the true geometric structure of the image.
    • Correction algorithms typically adjust each pixel position in the image to compensate for distortion effects.

The following are radial distortion legends, which are: positive radial distortion, original image, negative radial distortion
Distorted pictures
Radial distortion code implementation:

import numpy as np
import cv2


def apply_radial_distortion(image, k1, k2):
    height, width = image.shape[:2]
    # 计算图像中心
    center_x, center_y = width / 2, height / 2

    # 准备畸变后的图像
    distorted_image = np.zeros_like(image)

    # 遍历每个像素
    for i in range(height):
        for j in range(width):
            # 计算相对于中心的坐标
            x = (j - center_x) / center_x
            y = (i - center_y) / center_y
            r = np.sqrt(x ** 2 + y ** 2)

            # 应用畸变模型
            x_distorted = x * (1 + k1 * r ** 2 + k2 * r ** 4)
            y_distorted = y * (1 + k1 * r ** 2 + k2 * r ** 4)

            # 计算畸变后的像素位置,并确保它在图像范围内
            distorted_j = int(center_x * (x_distorted + 1))
            distorted_i = int(center_y * (y_distorted + 1))

            if 0 <= distorted_j < width and 0 <= distorted_i < height:
                distorted_image[i, j] = image[distorted_i, distorted_j]

    return distorted_image


# 棋盘格参数
chessboard_size = (8, 6)
square_size = 50

# 创建画布
image_size = (300, 400)
image = np.zeros((image_size[0], image_size[1], 3), dtype=np.uint8) + 255

# 画棋盘格
for i in range(chessboard_size[1]):
    for j in range(chessboard_size[0]):
        top_left_x = j * square_size
        top_left_y = i * square_size
        bottom_right_x = (j + 1) * square_size
        bottom_right_y = (i + 1) * square_size
        color = (0, 0, 0) if (i + j) % 2 == 0 else (255, 255, 255)
        cv2.rectangle(image, (top_left_x, top_left_y), (bottom_right_x, bottom_right_y), color, -1)

# 应用径向畸变
distorted_image3 = apply_radial_distortion(image, 0.05, -0.05)
distorted_image4 = apply_radial_distortion(image, -0.05, 0.05)

# 显示原始和畸变后的图像
cv2.imshow('Distorted Image', cv2.hconcat([distorted_image3,
                                           np.zeros((image.shape[0], 10, 3), dtype=np.uint8) + 127,
                                           image,
                                           np.zeros((image.shape[0], 10, 3), dtype=np.uint8) + 127,
                                           distorted_image4]))
cv2.waitKey(0)
cv2.destroyAllWindows()

1.4 Perspective transformation

Perspective Transformation is a mathematical transformation used to describe and realize the projection of objects in three-dimensional space on a two-dimensional image plane. In computer vision, it is used to model how a camera captures the three-dimensional world, and is an important part of understanding images and spatial relationships.

  1. Change principle:

    • Perspective transformation takes into account the visual effects caused by the different distances of objects from the camera, that is, distant objects appear smaller than nearby objects.
    • This transformation is usually not linear and needs to be described by specific mathematical formulas.
  2. 应用

    • In computer graphics and computer vision, perspective transformation is widely used in image correction, three-dimensional reconstruction, augmented reality (AR) and virtual reality (VR) and other fields.
    • Perspective transformation can be used to convert a flat image into an image with a different perspective or viewpoint, or to correct perspective distortion in an image.

Understanding lens distortion and perspective transformation is critical for high-precision image processing and analysis, especially in applications that require recovering or understanding three-dimensional scene information from images.

1.5 Importance and application scenarios of calibration

Camera calibration plays a crucial role in the field of computer vision. It not only affects the accuracy of image analysis and processing, but is also the key to the success or failure of many advanced vision applications. The following are some of the importance and application scenarios of camera calibration:

  1. Improving the accuracy of image analysis: Through calibration, the internal and external parameters of the camera can be accurately understood, so that distortion can be handled more accurately in image processing, perspective correction, and the image can be improved. Accuracy of analysis.

  2. Robot Navigation: In robot navigation, accurate camera calibration can help the robot understand its environment more accurately, including the size, location and spatial relationship of objects, thereby achieving more effective and safe navigation.

  3. Self-driving cars: Self-driving cars rely on cameras to sense their surroundings. Accurate calibration is fundamental to ensuring that a car can correctly understand its environment, such as the location of road signs, obstacles, pedestrians and other vehicles.

  4. Augmented Reality (AR): In augmented reality applications, proper camera calibration allows virtual objects to be seamlessly blended with real-world images, providing a more realistic user experience.

  5. Monocular vision application: Although the monocular system cannot directly obtain depth information, the camera parameters obtained through calibration can be used to estimate the geometry of the scene and the approximate depth of the object.

  6. Binocular vision application: In a binocular vision system, the accurate calibration of two cameras is the key to calculating the depth of an object and performing three-dimensional reconstruction. It allows the system to estimate the exact location and depth of an object by comparing images from two different viewpoints.

  7. Advanced image processing: In the field of advanced image processing, such as 3D modeling, scene reconstruction, object tracking, etc., accurate camera calibration is the basis for achieving high-quality output.

Camera calibration is not only the basis for understanding and analyzing images, but also the key to achieving advanced computer vision capabilities. In many high-tech fields, such as automation, robotics, virtual reality, etc., camera calibration plays an indispensable role.

2. Monocular vision

In the field of computer vision, monocular vision is the most basic and widely used form. It refers to the use of a single camera to capture and analyze images.

2.1 Principles of monocular vision

Monocular vision systems rely primarily on a camera to capture images and use computer vision algorithms to interpret and analyze these images. Although monocular vision cannot directly provide depth information, it is very effective in processing two-dimensional images, image recognition and classification, etc.

2.1.1 Principles of monocular vision

The core principle of monocular vision is to extract useful information from two-dimensional images for understanding and analyzing the three-dimensional world. This usually involves the following steps:

  1. Image Capture: The camera captures a two-dimensional representation of the real world.

  2. Feature extraction: The algorithm identifies key feature points or edges in the image. This may include lines, corners, outlines, etc.

  3. Feature matching and tracking: Track these feature points in consecutive image frames to understand the dynamic changes of objects or scenes.

  4. 3D scene recovery: Although monocular vision cannot directly measure depth, depth information can be indirectly inferred through other methods such as motion parallax, scale-invariant feature transform (SIFT), etc. .

  5. Image understanding: Apply specific algorithms (such as object detection, image classification) to interpret image content.

2.1.2 Formula of monocular vision

In monocular vision, an important concept isthe pinhole camera model, which is the simplest imaging model used to describe How the three-dimensional world is mapped onto a two-dimensional image. Its mathematical expression is:

p = K [ R ∣ t ] P p = K [R | t] P p=K[Rt]P

  • P P P represents a point in the three-dimensional world.
  • p p p is the projection of the point on the image plane.
  • K K K is the internal parameter matrix, including focal length and principal point coordinates.
  • R R R t t t are the rotation and translation vectors of the camera, representing external parameters.

2.1.3 Application areas

Monocular vision systems are widely used in many fields, including:

  • Monitoring system: Use monocular cameras for real-time monitoring, identification and tracking of people or objects.
  • Automatic driving assistance: used for vehicle lane detection, traffic sign recognition, etc.
  • Robot Navigation: Helps robots understand the environment, plan paths and avoid obstacles.
  • Augmented Reality: Superimposing virtual information on images of the real world.

2.2 Steps to achieve monocular visual calibration

Monocular vision calibration is a key process that helps us obtain the internal parameters and distortion parameters of the camera to accurately understand how the camera captures real-world scenes. The following are the basic steps to achieve monocular visual calibration.

2.2.1 Prepare calibration plate

The calibration plate is an indispensable tool in camera calibration. It provides a reference pattern of a known structure for easy identification and positioning in the image. The most commonly used calibration plates are checkerboard and dot grid.

  • Checkerboard: consists of alternating black and white squares, suitable for corner detection.
  • Dot grid: It consists of a series of dots and is suitable for more precise feature point positioning.

When selecting a calibration plate, you need to ensure that the size and pattern of the calibration plate are appropriate for the camera and application scenario being used.

2.2.2 Capture calibration images

Use your camera to photograph the calibration plate from different angles and positions. This step is crucial because it directly affects the accuracy and robustness of the calibration. General advice:

  • Photograph the calibration plate from multiple angles and distances to ensure adequate viewing angle coverage.
  • Make sure the calibration plate is clearly visible in every image.

2.2.3 Extract corner points

For each captured calibration plate image, the corner points or feature points on the calibration plate need to be identified and extracted in the image. This is usually done by using specific functions in OpenCV, such as cv2.findChessboardCorners for checkerboards.

2.2.4 Calculate internal parameters and distortion parameters

Once enough images have been collected and feature points extracted from them, the camera's intrinsic and distortion parameters are calculated using the cv2.calibrateCamera function in OpenCV. Internal parameters include focal length and principal point coordinates, while distortion parameters describe the distortion characteristics of the lens.

These parameters are key to understanding how the camera captures images and are essential for subsequent image processing and analysis. For example, through these parameters, the captured image can be corrected for distortion, thereby obtaining more accurate visual data.

2.3 Monocular vision camera calibration practice

The following uses an example to demonstrate how to perform monocular vision camera calibration in OpenCV.
Checkerboard pictures

import numpy as np
import cv2
import matplotlib.pyplot as plt

# 棋盘格参数
cross_points = (9, 6)
square_size = 1.0  # 假设棋盘格每个方块的大小为1.0单位

# 加载图像
image_path = 'Chessboard_Photo.jpg'
image = cv2.imread(image_path)
image = cv2.resize(image, (int(image.shape[1] / 2), int(image.shape[0] / 2)))
gray = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

# 寻找棋盘格角点
ret, corners = cv2.findChessboardCorners(gray, cross_points, None)

# 如果找到足够的角点,则进行标定
if ret == True:
    # 准备对象点
    objp = np.zeros((cross_points[0] * cross_points[1], 3), np.float32)
    objp[:, :2] = np.mgrid[0:cross_points[0], 0:cross_points[1]].T.reshape(-1, 2)
    objp *= square_size

    # 将对象点和图像点放入数组中
    objpoints = []  # 真实世界中的点
    imgpoints = []  # 图像中的点

    objpoints.append(objp)
    imgpoints.append(corners)

    # 进行相机标定
    ret, mtx, dist, rvecs, tvecs = cv2.calibrateCamera(objpoints, imgpoints, gray.shape[::-1], None, None)

    # 标记角点并显示
    img = cv2.drawChessboardCorners(image.copy(), cross_points, corners, int(ret))

    # 透视变换
    # 原始图像尺寸
    h, w = image.shape[:2]

    # 获取角点的坐标
    top_left, top_right, bottom_right, bottom_left = corners[0][0], corners[8][0], corners[-1][0], corners[-9][0]
    pts1 = np.float32([top_left, top_right, bottom_right, bottom_left])

    # 计算四个角点到图像边缘的最大距离
    maxDistToLeftEdge = max(top_left[0], bottom_left[0])
    maxDistToRightEdge = max(w - top_right[0], w - bottom_right[0])
    maxDistToTopEdge = max(top_left[1], top_right[1])
    maxDistToBottomEdge = max(h - bottom_left[1], h - bottom_right[1])

    # 使用最大距离来定义目标图像的大小
    maxWidth =int(w + maxDistToLeftEdge + maxDistToRightEdge)
    maxHeight =int( h + maxDistToTopEdge + maxDistToBottomEdge)

    # 计算目标点的坐标,使整张图片在透视变换后能居中显示
    pts2 = np.float32([
        [maxDistToLeftEdge, maxDistToTopEdge],
        [maxWidth - maxDistToRightEdge - 1, maxDistToTopEdge],
        [maxWidth - maxDistToRightEdge - 1, maxHeight - maxDistToBottomEdge - 1],
        [maxDistToLeftEdge, maxHeight - maxDistToBottomEdge - 1]
    ])

    # 获取透视变换矩阵
    M = cv2.getPerspectiveTransform(pts1, pts2)

    # 应用透视变换
    dst_perspective = cv2.warpPerspective(image, M, (maxWidth, maxHeight))

    # 显示原图及透视变换后的图片
    plt.figure(figsize=(8, 6))
    plt.subplot(1, 2, 1)
    plt.imshow(cv2.cvtColor(img, cv2.COLOR_BGR2RGB))
    plt.title('Original Image with Corners')

    plt.subplot(1, 2, 2)
    plt.imshow(cv2.cvtColor(dst_perspective, cv2.COLOR_BGR2RGB))
    plt.title('Perspective Transform')

    plt.show()

else:
    print("找不到足够的角点,请检查图片是否适合棋盘格标定。")

Perspective Transform

3. Binocular vision

Binocular vision is an important branch of the field of computer vision, which uses two cameras to capture images from slightly different angles to simulate human binocular vision. This method has significant advantages over monocular vision in obtaining depth information and three-dimensional scene reconstruction.

3.1 Principles and applications of binocular vision

The binocular vision system simulates the human binocular vision mechanism and captures the same scene through two spatially separated cameras to obtain depth information of the scene. The following is a detailed introduction to the basic principles of binocular vision and its applications.

3.1.1 Principles of binocular vision

The core of binocular vision is to use the parallax of two cameras to calculate depth information. Parallax refers to the difference in image position of the same object in two camera views. The following is a mathematical representation of binocular vision:

  • Suppose the two cameras are located at C 1 C_1 C1 Sum C 2 C_2 C2, they observe the same object point P P The projections of P on the two camera imaging planes are p 1 p_1 p1 sum p 2 p_2 p2
  • b b b is the baseline distance between the two cameras (i.e. C 1 C_1 C1 Sum C 2 C_2 C2the distance between).
  • f f f is the focal length of the camera.

Then, object point P P P 的深度 Z Z Z can be calculated by the following formula:

Z = b × f d Z = \frac{b \times f}{d} WITH=db×f

Then, d d d is the corresponding point on the imaging plane of the two cameras p 1 p_1 p1 sum p 2 p_2 p2parallax between.

3.1.2 Application of binocular vision

Binocular vision systems have a wide range of applications, including:

  1. 3D reconstruction: By calculating the depth information of objects, the three-dimensional structure of the scene can be reconstructed, which can be used in fields such as virtual reality and games.
  2. Robot Navigation: Use depth information to help robots perform spatial perception and path planning, especially suitable for automation and industrial robots.
  3. Augmented Reality: Combining real scenes and computer-generated images, requiring precise spatial and depth information to provide a more realistic experience.
  4. Autonomous vehicles: Provides depth perception capabilities for autonomous driving systems for obstacle detection, lane recognition and environmental understanding.

In these application fields, the depth perception ability of binocular vision plays a vital role, providing intelligent systems with richer and more accurate three-dimensional spatial information than monocular vision.

3.2 Comparison between binocular vision and monocular vision

In computer vision systems, both monocular vision and binocular vision play important roles, but they are essentially different in the acquisition and processing of depth information. Here are the main comparisons of these two vision systems:

monocular vision

  • Principle: Capturing images using a camera relies on two-dimensional image data.
  • Depth acquisition: The monocular vision system relies on algorithms to infer depth information, such as estimating depth by analyzing object size changes, texture gradients, occlusion relationships, etc.
  • advantage
    • Cost-effective: Requires less hardware and lower costs.
    • Simplicity: System setup and processing procedures are relatively simple.
  • Localized:
    • Inaccurate depth information: Algorithmically inferred depth is less accurate than physical measurements.
    • Highly dependent on the environment: The effect is better in scenes with rich or obvious textures.

Binocular vision

  • Principle: Use two cameras to capture images of the same scene from different angles, simulating human binocular vision.
  • Depth acquisition: The binocular vision system directly calculates depth information by comparing the image difference between two viewing angles, usually represented by a disparity map.
  • advantage
    • Accurate depth information: Depth information can be measured and calculated directly, with more accurate results.
    • Three-dimensional perception ability: More suitable for spatial positioning and three-dimensional reconstruction.
  • Localized:
    • Cost and Complexity: Requires more hardware and complex calibration process.
    • More computationally demanding: Processing two images and calculating depth requires more computing power.

Although monocular vision has advantages in cost and simplicity, it is limited in depth perception and accuracy. In contrast, although binocular vision has higher hardware requirements and processing complexity, it can provide more accurate depth information and three-dimensional space perception. The choice of which vision system to use depends on the specific needs of the application, budget and expected accuracy.

3.3 Steps to achieve binocular vision calibration

Binocular vision calibration not only involves the internal parameters of each camera, but also requires calibration of the relationship between the two cameras (such as relative position and orientation). Basic steps include:

  1. Calibrate each camera individually: First perform monocular visual calibration on the two cameras.
  2. Calibrate the relationship between cameras: Calculate the rotation and translation matrices between the two cameras, which is called stereo correction.
  3. Stereo Correction and Reprojection: Correct the images from the two cameras so that they are aligned on the same plane to facilitate subsequent depth calculation.

3.4 Related functions and methods in OpenCV

OpenCV provides a series of functions to support binocular vision calibration and depth calculation:

  • cv2.stereoCalibrate: Used to calculate the relationship between two cameras.
  • cv2.stereoRectify: Perform stereo correction.
  • cv2.createStereoBM or cv2.createStereoSGBM: Create a stereo matching object to calculate the depth map.

I don’t have dual cameras at hand yet. I will add this part when I have the funds to configure one o(╥﹏╥)o.


Summarize

This blog briefly introduces the basic principles of camera calibration, the key concepts of monocular vision and binocular vision, and their practical applications. Starting from the basics of camera models and coordinate systems, we explore the importance of internal and external parameters, and how to deal with lens distortion and perspective transformation. This provides a solid theoretical basis for understanding how cameras capture and convert images.

In the monocular vision section, we have an in-depth understanding of how it works and the wide range of applications of monocular vision in various fields. At the same time, the steps to achieve monocular visual calibration are introduced in detail, from preparing the calibration plate to calculating the internal parameters and distortion parameters, providing practical guidance.

For binocular vision, we discussed its principles and applications, with special emphasis on its advantages in depth information acquisition. In addition, the comparison between binocular vision and monocular vision highlights the unique value of binocular vision in three-dimensional space perception.

Guess you like

Origin blog.csdn.net/qq_31463571/article/details/134619531