Quick Start with OpenCV: Feature Point Detection and Matching

Article directory

Preface
1. Corner detection
2. Feature point detection
3. Feature point matching
Summarize

Preface

In the field of computer vision, feature point detection and matching are the core of solving various problems, including image recognition, tracking, three-dimensional reconstruction and motion analysis. OpenCV, as a powerful visual processing library, provides rich functions to handle these tasks. This blog aims to provide a quick start guide on feature point detection and matching methods in OpenCV.

We will start with corner detection, discuss classic algorithms such as Harris, Shi-Tomasi, and FAST, and introduce their principles, formulas, and code implementations. Next, we will delve into advanced topics of feature point detection, covering algorithms such as SIFT, SURF, and ORB. Each method will have a detailed function analysis to help understand the working principle behind it. Finally, we will discuss feature point matching techniques, including BF matcher, FLANN matcher and RANSAC matching method, which are crucial when dealing with feature point correspondences between different images.
OpenCV Logo

1. Corner detection

Corner Detection is a basic concept in computer vision and image processing, which refers to identifying points with obvious corner features in an image. In OpenCV, we usually use Harris corner detection or Shi-Tomasi corner detection to achieve this function.

1.1 Corner Features

1.1.1 Concept of corner features

Corner features refer to those points in the image that have obvious local features and changes in multiple directions. In computer vision, corner points are one of the most important features in an image because they are usually invariant to changes in the image (such as viewing angle, lighting, scale, etc.). Corner features are widely used in image processing, pattern recognition, three-dimensional modeling, motion tracking and other fields.

1.1.2 Characteristics of corner points

Local Features: Corner points are important representations of local features in the image and can represent key information of the image.

Invariance: They are relatively stable and have certain resistance to image changes such as lighting, rotation, scaling, etc.

High information content: Corner points contain rich information and are suitable for image matching, target tracking, etc.

1.1.3 Key point drawing code implementation

import cv2
import random

# 读取图像
image = cv2.imread('tulips.jpg')

# 生成随机关键点
keypoints = []
num_keypoints = 50  # 假设我们想生成50个关键点
for _ in range(num_keypoints):
    x = random.randint(0, image.shape[1] - 1)
    y = random.randint(0, image.shape[0] - 1)
    keypoints.append(cv2.KeyPoint(x, y, 1))

# 使用 drawKeypoints 绘制关键点
keypoint_image = cv2.drawKeypoints(image, keypoints, None, color=(0, 255, 0), flags=0)

# 显示图像
cv2.imshow('Random Keypoints', keypoint_image)
cv2.waitKey(0)
cv2.destroyAllWindows()

Random Keypoints
In this example, I created 50 randomly positioned keys. cv2.KeyPoint Requires x and y coordinates and the size of the keypoint (here set to 1). These points will be marked in green on the image.

1.1.4 Function analysis

drawKeypointsfunction is a function in the OpenCV library for drawing keypoints on an image. The purpose of this function is to visualize the detected key points so that these points are easier to identify on the image.

def drawKeypoints(image, keypoints, outImage, color=None, flags=None)

image: This is the source image, the image on which you want to draw the keypoints. This image should be in a standard OpenCV image format, typically read using cv2.imread .

keypoints: This parameter is a list of keypoints. These key points are usually obtained through feature detection algorithms (such as SIFT, SURF, ORB, etc.). Each keypoint usually contains the position (x, y coordinates) and other information (such as direction, size, etc.) of a specific point in the image.

outImage: This is the output image. The function draws the keypoints on this image and returns it. If this parameter is None, OpenCV will usually draw keypoints directly on the source image.

color: This optional parameter defines the color of the keypoint. If not specified, the default color will be used. Colors are usually specified in the format (B, G, R), where B, G, R represent the intensities of blue, green, and red, respectively.

flags: This optional parameter defines some characteristics when drawing key points. OpenCV provides several different flags to control how keypoints are drawn.
For example, cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS will draw the size and direction of the keypoint,
and cv2.DRAW_MATCHES_FLAGS_NOT_DRAW_SINGLE_POINTS will not Individual keypoints will be drawn.

1.2 Harris corner detection

1.2.1 Harris corner detection principle

The basic principle of Harris corner detection is to observe the degree of grayscale change of pixels in the window when the window moves. Specifically, it calculates the second-order matrix of the grayscale change when the window moves in various directions, and uses this matrix to determine whether a point is a corner point. The characteristic of corner points is that no matter which direction the window moves, the grayscale changes within the window are large.

The basic detection principle is as follows:

Image windows and movement:

Suppose there is a small window (or mask) that moves across the image.

For each pixel in the window, we want to know how the pixel intensity changes as the window is moved a small distance in each direction.

Intensity change function:

Intensity changes can use the function $E (u, v)$ Description, inside $u$ sum $v$ is the movement amount of the window in the x and y directions.

1.2.2 Harris corner detection formula

The core of Harris corner detection is to calculate the Harris response value of each pixel. The mathematical formula is:
$\sum_{x,y} w(x, y)[I(x + u, y + v) - I(x, y)]^2$
inside, $I (x, y)$ This is the strength of the image prime score (x, y), $w (x, y)$ is a window function (usually a Gaussian window) that gives higher weight to pixels near the center of the window.
This formula can be expanded and simplified through Taylor series, and finally expressed as:
$E(u,v) ≈ [u, v] M [u, v]^T$
inside, $M$ is calculated from the gradient derivative of the image $\times 2$ square.

The calculation of the Harris response value can be performed by the following steps:

Calculate image gradient:

needs to calculate the gradient of the image in the x and y directions, which is usually done using the Sobel operator. This can be expressed as $I_x$ Sum $I_y$ 。

Calculate the product of gradients:

calculation level $I_x^2, I_y^2$ 和 $I_xI_y$ 。

Apply Gaussian filter:

Apply a Gaussian filter (window function) to these products to smooth them within a local area. The Gaussian window is used here to give more weight to local areas of the image.

Construct Harris matrix:

Use the above results to construct a Harris matrix (also called a structure tensor), whose general form is:

$\begin{pmatrix} \sum I_x^2 & \sum I_xI_y \\ \sum I_xI_y & \sum I_y^2 \end{pmatrix}$

Among them, the summation represents the summation of the corresponding pixel values within the window.

Calculate Harris response value:

Calculate the Harris response value R for each pixel:

$\text{det}(M) - k \cdot (\text{trace}(M))^2$

In that, $\text{det}(M)$ is a square M matrix Expression, $\text{trace}(M)$ is the trace of the matrix ( That is, the sum of the diagonal elements), k is an empirical constant (usually between 0.04 and 0.06).

To understand how this formula relates to the eigenvalues of M, we can consider the eigendecomposition of the matrix M. Two eigenvalues of matrix M $\lambda_1$ sum $\lambda_2$ Describes the gradient intensity of the two main directions of the image at this point. The determinant of matrix M is the product of these two eigenvalues:

$\text{that}(M) = \lambda_1 \cdot \lambda_2$

The trace of the matrix is the sum of these two eigenvalues:

$\text{trace}(M) = \lambda_1 + \lambda_2$

Therefore, the Harris response value R can be rewritten as:

$\lambda_1 \lambda_2 - k (\lambda_1 + \lambda_2)^2$

Identification of corner points

According to the Harris response value R, you can determine whether each pixel is a corner point:

Result $R$ is very large, then the point may be a corner point.

Result $If R$ is small or close to zero, then the point is probably a flat area.

Result $R$ is negative, then the point may be an edge.

Harris Corner Detection Limitations:
Although Harris Corner Detection is a powerful tool, it has limitations. For example, it is not scale-invariant, which means that the detected corners may change when the image scale changes significantly. In addition, it is sensitive to noise and may produce false detections when processing highly structured images.

1.2.3 Code implementation

In OpenCV, Harris corner detection can be implemented through the cv2.cornerHarris() function. Here is a simple example code:

import cv2
import numpy as np

# 读取图像
img = cv2.imread('tulips.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

# Harris角点检测
gray = np.float32(gray)
dst = cv2.cornerHarris(gray, blockSize=2, ksize=3, k=0.04)

# 结果阈值化以确定角点位置
dst = cv2.dilate(dst, None)
thresh = 0.01 * dst.max()
img[dst > thresh] = [0, 0, 255]

# 绘制圆圈标记角点
for i in range(dst.shape[0]):
    for j in range(dst.shape[1]):
        if dst[i, j] > thresh:
            # 绘制圆圈
            cv2.circle(img, (j, i), radius=2, color=(0, 255, 0), thickness=1)

# 显示图像
cv2.imshow('Harris Corners', img)
cv2.waitKey(0)
cv2.destroyAllWindows()

In this code,cv2.cornerHarris() the function accepts several parameters: input image, neighborhood size (blockSize), aperture size of the Sobel operator used for edge detection (ksize) and the free parameters (k) of the Harris detector. The function returns the Harris response value for each pixel. Corner points can be selectively displayed by setting a threshold.
Harris Corners

1.2.4 Function analysis

cornerHarrisA function for corner detection. It implements the Harris corner detection algorithm, which effectively identifies corners in images.

def cornerHarris(src, blockSize, ksize, k, dst=None, borderType=None)

src (Source Image): Input image, it should be single channel (grayscale), the data type can be 8-bit or floating point. This is the original image that the function processes.

blockSize: It refers to the neighborhood size used for corner detection. Simply put, it considers the size of the pixel block around each pixel for corner detection. Larger blocks will consider more pixels and may be more suitable for detecting large corner features.

ksize (Kernel Size): Aperture parameter of Sobel operator. This parameter determines the size of the Sobel convolution kernel used to calculate the image gradient. This directly affects the calculation of the image gradient, which in turn affects the results of corner detection.

k: Free parameter of the Harris detector, used to weight the corner point measurements in the response function. It affects the sensitivity of corner detection, typically between 0.04 and 0.06.

dst (Destination Image): Image used to store the Harris detector response. Its type is CV_32FC1 and has the same size as the input image src.

borderType: Type of pixel extrapolation method. When performing image gradient calculation, it is necessary to extrapolate the pixel values around the edge pixels. This parameter specifies the extrapolation method used. Common extrapolation methods include reflection extrapolation, constant extrapolation, etc.

The main function of the function is to run the Harris corner detector on the input image. It calculates the response value of each pixel, which can be used to determine the position of the corner point in the image. Corner points can be identified as local maxima of these response values.

1.3 Shi-Tomasi Kakuten 检测

The Shi-Tomasi corner detection method is an improvement based on Harris corner detection, which provides better performance and accuracy in many aspects. The core idea of the Shi-Tomasi method is to identify corner points by evaluating the minimum eigenvalues of the image.

1.3.1 Shi-Tomasi corner detection principle

The basic principle of the Shi-Tomasi method is similar to that of Harris corner detection, both of which are based on the autocorrelation matrix of local areas of the image. The difference is that the Shi-Tomasi method uses the smallest eigenvalue of the autocorrelation matrix to evaluate the quality of the corner points.

For each point in the image, first calculate the autocorrelation matrix of its surrounding area. Then, calculate the two eigenvalues of the matrix (denoted as $\lambda_1$ sum $\lambda_2$ ). In the Shi-Tomasi method, if both eigenvalues are greater than a certain threshold, the point is considered a corner point.

1.3.2 Shi-Tomasi corner detection formula

The corner response function of the Shi-Tomasi method is defined as:

$\min(\lambda_1, \lambda_2)$

In that, $\lambda_1, \lambda_2$ is the eigenvalue of the autocorrelation matrix.

1.3.3 Code implementation

In OpenCV, Shi-Tomasi corner detection can be implemented through the cv2.goodFeaturesToTrack() function. Here is a simple example code:

import cv2

# 读取图像
img = cv2.imread('tulips.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

# Shi-Tomasi角点检测
corners = cv2.goodFeaturesToTrack(gray, maxCorners=100, qualityLevel=0.01, minDistance=10)

# 绘制角点
for corner in corners:
    x, y = corner.ravel()
    cv2.circle(img, (int(x), int(y)), 5, (0, 255, 0), -1)

# 显示图像
cv2.imshow('Shi-Tomasi Corners', img)
cv2.waitKey(0)
cv2.destroyAllWindows()

Shi-Tomasi Corners

In this code:

cv2.goodFeaturesToTrack()The function receives several parameters: input image, maximum number of corner points to be detected (maxCorners), corner quality level (qualityLevel, an intermediate A number between 0 and 1, representing the relative value of the smallest eigenvalue) and the smallest distance between corners (minDistance).
The detected corner point is returned as (x, y) coordinates.

1.3.4 Function analysis

goodFeaturesToTrackFunction for corner detection, which implements the Shi-Tomasi method. The following is a brief description of each parameter of the function:

def goodFeaturesToTrack(image, maxCorners, qualityLevel, minDistance, corners=None, mask=None, blockSize=None, useHarrisDetector=None, k=None)

image: Input image. It should be a single channel 8-bit or floating point 32-bit image.

maxCorners: The maximum number of corner points to return. If more corners are detected than this number, the strongest maxCorners corners will be returned. If maxCorners <= 0, all detected corners are returned.

qualityLevel: Characteristic value threshold of corner point quality. This parameter is multiplied by the quality measure (minimum eigenvalue or Harris function response) of the best corner point in the image, and the resulting product is used to reject lower quality corners.

minDistance: Returns the minimum possible Euclidean distance between corner points. This is used to ensure that there is some space between the selected corner points.

corners: Output vector of detected corner points.

mask: Mask of the area of interest. If provided, it must be a binary image of type CV_8UC1 and the same size as image . Detect corner points within areas with a non-zero mask.

blockSize: The size of the neighborhood considered when calculating the derivative covariance matrix.

useHarrisDetector: Indicates whether to use the Harris corner detector. If not used, the minimum eigenvalue method (Shi-Tomasi method) is applied.

k: Free parameter used only when using Harris detector.

1.4 FAST corner detection

The FAST (Features from Accelerated Segment Test) algorithm is a widely used corner detection method. It is known for its high speed and simplicity and plays an important role in many real-time image processing systems.

1.4.1 FAST corner detection principle

Select pixel point: Select a pixel point P in the image as the center point of detection.

Set brightness threshold: Set a threshold T for comparison with the brightness of the center pixel P.

Circular neighborhood detection: Select a circular neighborhood (usually containing 16 pixels) around the center point P to determine whether P is a corner point.

Continuity check: Check whether there are at least N consecutive pixels in the circular neighborhood whose brightness is higher (or lower) than P plus (or minus). Go) threshold T. Usually N is set to 12.

Non-maximum suppression: Apply non-maximum suppression (Non-Maximum Suppression) technology to remove corner points with weak responses and retain only the most significant corner points. .

1.4.2 FAST corner detection features and applications

Fast speed: The simple algorithm structure makes the FAST algorithm execute very fast and is suitable for real-time systems and high frame rate videos.
Widely used: It is widely used in fields such as robot navigation, video tracking, three-dimensional modeling and motion detection.
Limitations: Sensitive to noise and not scale-invariant. It is often used in combination with other algorithms to overcome these limitations.

1.4.3 Code implementation

Here's a basic code example:

import cv2

# 读取图像
image = cv2.imread('tulips.jpg')

# 初始化FAST对象
fast = cv2.FastFeatureDetector_create(50)

# 检测角点
keypoints = fast.detect(image, None)

# 在图像上绘制角点
image_with_keypoints = cv2.drawKeypoints(image, keypoints, None, color=(0, 255, 0))

# 显示图像
cv2.imshow('FAST Keypoints', image_with_keypoints)
cv2.waitKey(0)
cv2.destroyAllWindows()

FAST Keypoints

1.4.4 Function analysis

FastFeatureDetector_createFunction for creating FAST detectors.

def FastFeatureDetector_create(threshold=None, nonmaxSuppression=None, type=None)

threshold: Threshold used to determine corner points. It is a threshold used to determine whether a pixel is a corner point or a change in brightness or color intensity. The higher the threshold, the fewer corners will be detected, but the quality may be higher.

nonmaxSuppression: Non-maximum suppression. When set to True, the algorithm uses non-maximum suppression around detected corners to ensure that the detected corners are local maxima, thus avoiding detection near corners to multiple adjacent corner points. Typically, enabling non-maximum suppression will give better results.

type: Specify the type of FAST algorithm. OpenCV provides different versions of the FAST algorithm, such as TYPE_9_16, TYPE_7_12, etc. These types specify different sizes of circular neighborhoods used for corner detection.

1.5 Sub-pixel corner detection

Sub-pixel corner detection is a technology that improves corner positioning accuracy to the sub-pixel level. Traditional corner detection methods, such as Harris corner detection, can usually only locate to pixel-level accuracy. In some applications, such as high-precision image alignment and three-dimensional reconstruction, higher-precision corner point positioning is required. Sub-pixel corner detection achieves this goal by performing a fine analysis of the pixels surrounding the corner.

1.5.1 Principle of sub-pixel corner detection

The basic principle of sub-pixel corner detection is to use the grayscale distribution information of local areas of the image to finely adjust the position of the corner points. It usually includes the following steps:

Preliminary corner detection: First use conventional methods (such as Harris corner detection) to locate the approximate position of the corner in the image.

Define local window: Define a small local window around each initially detected corner point.

Grayscale centroid calculation: In this local window, the grayscale centroid is calculated, which can be regarded as the "center" of the quality of the local area of the image. Based on the position of the gray centroid, the initially detected corner position can be fine-tuned.

Iterative optimization: Refine the corner position through iteration until certain accuracy requirements are met.

1.5.2 Sub-pixel corner detection formula

The key formula for sub-pixel corner detection is the calculation of grayscale centroid. Coordinates of grayscale centroid $x_c, y_c)$ can be calculated using the following formula:

$x_c = \frac{\sum x_i w_i}{\sum w_i}, \quad y_c = \frac{\sum y_i w_i}{\sum w_i}$

In that, $x_i, y_i$ is the coordinate of the pixel in the local window, $w_i$ is the gray value of the corresponding pixel.

1.5.3 Code implementation

import cv2
import numpy as np

# 读取图像
img = cv2.imread('tulips.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

# Harris角点检测
gray = np.float32(gray)
dst = cv2.cornerHarris(gray, blockSize=2, ksize=3, k=0.04)

# 结果阈值化，获取角点位置
dst = cv2.dilate(dst, None)
_, dst = cv2.threshold(dst, 0.01 * dst.max(), 255, 0)
dst = np.uint8(dst)

# 寻找质心并创建关键点集
_, _, _, centroids = cv2.connectedComponentsWithStats(dst)
criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 100, 0.001)
corners = cv2.cornerSubPix(gray, np.float32(centroids), (5,5), (-1,-1), criteria)
keypoints = [cv2.KeyPoint(x=corner[0], y=corner[1], _size=20) for corner in corners]

# 使用drawKeypoints绘制关键点
img_keypoints = cv2.drawKeypoints(img, keypoints, None, color=(0,255,0))

# 显示图像
cv2.imshow('Harris Corners', img_keypoints)
cv2.waitKey(0)
cv2.destroyAllWindows()

Harris Corners
The purpose of this code is to find the precise location of the detected corners after using Harris corner detection and convert them into keypoint objects for further processing or visualization.

1.5.4 Function analysis

1. cv2.connectedComponentsWithStats

ret, labels, stats, centroids = cv2.connectedComponentsWithStats(dst)

cv2.connectedComponentsWithStatsThe function is used to analyze a binary image and find all connected regions (that is, blocks of pixels that are connected to each other) in the image.
dstIt is the result of Harris corner detection, which is usually thresholded to obtain a binary image.
The function returns several values:
- ret: The number of connected regions.
- labels: An image-sized array in which each element represents the label of the connected region to which the corresponding pixel belongs.
- stats: Statistical information for each connected region, such as the size of the region, bounding box, etc.
- centroids: The coordinates of the centroid (center point) of each connected area.

2. cv2.cornerSubPix Sub-pixel detection function

criteria = (cv2.TERM_CRITERIA_EPS + cv2.TERM_CRITERIA_MAX_ITER, 100, 0.001)
corners = cv2.cornerSubPix(gray, np.float32(centroids), (5,5), (-1,-1), criteria)

This part of the code first sets the standard for iteratively finding corner points, that is, criteria. Two conditions are used here: the number of iterations reaches 100 or the corner position change is less than 0.001.
cv2.cornerSubPixThe function is used to further refine the position of the corner points. It finds more accurate corner point positions within a small neighborhood based on the initial centroid position (i.e. centroids).
parametersgray is the original grayscale image, np.float32(centroids) is the centroid coordinate obtained in the previous step, (5,5) is the search The size of the window,(-1,-1) is the radius of the dead zone and usually does not need to be changed.

3. Create key point objects

keypoints = [cv2.KeyPoint(x=corner[0], y=corner[1], _size=20) for corner in corners]

This line of code converts the corner point coordinates found by the cv2.cornerSubPix function into OpenCV's KeyPoint object.
Each KeyPoint object contains the position of the corner point and other properties (such as size, direction, etc.). Here, we only set the position (x and y coordinates) and size (_size), and the size is set to 20 just for visualization easier to observe.

2. Feature point detection

Feature point detection refers to finding points with unique attributes in images that can be matched and identified between different images. OpenCV provides several commonly used feature point detection methods, such as SIFT, SURF and ORB.

environmental inspection
When running the code in this chapter, please check the versions of Python and OpenCV first.
Especially when you encounter an error: AttributeError: module 'cv2' has no attribute 'xfeatures2d_SIFT' or AttributeError: module 'cv2' has no attribute 'xfeatures2d', please refer to the following content.

import sys

# 检查当前Python的版本
current_python_version = sys.version_info

# 检查Python版本是否是3.6
if current_python_version.major != 3 or current_python_version.minor != 6:
    python_version_message = "当前Python版本不是3.6，建议安装Python 3.6版本。"
else:
    python_version_message = "当前Python版本是3.6。满足运行环境。"

print(python_version_message)

import pkg_resources

# 设置所需检查的OpenCV版本
required_opencv_version = "3.4.2.16"
required_opencv_contrib_version = "3.4.2.16"

# 检查安装的OpenCV和opencv-contrib-python版本
try:
    installed_opencv_version = pkg_resources.get_distribution("opencv-python").version
except pkg_resources.DistributionNotFound:
    installed_opencv_version = None

try:
    installed_opencv_contrib_version = pkg_resources.get_distribution("opencv-contrib-python").version
except pkg_resources.DistributionNotFound:
    installed_opencv_contrib_version = None

# 构造安装提示信息
if installed_opencv_version != required_opencv_version or installed_opencv_contrib_version != required_opencv_contrib_version:
    opencv_installation_message = """
    # 卸载之前的OpenCV 
    pip uninstall opencv-python
    
    # 安装指定版本的OpenCV和opencv-contrib-python
    pip install opencv-python==3.4.2.16
    pip install opencv-contrib-python==3.4.2.16
    """
else:
    opencv_installation_message = "OpenCV和opencv-contrib-python版本正确。"

print(opencv_installation_message)

2.1 SIFT (Scale Invariant Feature Transform)

SIFT is an algorithm for detecting and describing local features in images. It finds key points on images at different scales and characterizes the area around each key point.

2.1.1 SIFT principle

The main idea of the SIFT algorithm is to find key points on different scale spaces and calculate the direction histograms of these key points as features. These keypoints are invariant to scale and rotation.

Scale space extreme value detection: Find potential points of interest in different scale spaces through Gaussian difference functions.

Key point positioning: Accurately determine the location and scale of key points, and remove low-contrast points and edge response points to enhance matching stability.

Direction assignment: Assign one or more directions to each key point, based on the local image gradient direction.

Key point description: In the area around each key point, calculate the direction and amplitude of its local gradient and generate a descriptor.

2.1.2 Code implementation

import cv2

img = cv2.imread('tulips.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

sift = cv2.xfeatures2d.SIFT_create()
keypoints, descriptors = sift.detectAndCompute(gray, None)

img = cv2.drawKeypoints(gray, keypoints, img,flags=cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS)
cv2.imshow('SIFT Features', img)
cv2.waitKey(0)
cv2.destroyAllWindows()

SIFT Features

2.1.3 Function analysis

SIFT_create

SIFT_createFunction used to create a SIFT detector object.

def SIFT_create(nfeatures=None, nOctaveLayers=None, contrastThreshold=None, edgeThreshold=None, sigma=None)

nfeatures: The optimal number of features to retain. Features are ranked by their scores (measured as local contrast in the SIFT algorithm).

nOctaveLayers: Number of layers in each octave. D. The value used in Lowe's paper is 3. The number of octaves is automatically calculated based on the image resolution.

contrastThreshold: Contrast threshold used to filter out weak features in semi-uniform (low contrast) areas. The larger the threshold, the fewer features the detector produces.

edgeThreshold: Threshold used to filter out edge features. Note that its meaning is different from contrastThreshold, that is, the larger the edgeThreshold, the fewer features are filtered out (the more features are retained).

sigma: The sigma value of the Gaussian blur applied to the input image at the 0th octave. If your images were captured by a low-quality camera with a soft lens, you may want to reduce this value.

detectAndCompute

detectAndComputeMethod used to detect keypoints in an image and compute descriptors for them.

def detectAndCompute(self, image, mask, descriptors=None, useProvidedKeypoints=None)

image: The image to be processed.

mask: Mask image used to define the area in the image to be processed.

descriptors: The calculated descriptors will be stored here.

useProvidedKeypoints: If True, this method only computes descriptors for the specified keypoints and does not detect new keypoints.

These two functions are used together to detect and describe keypoints in images. These keypoints and their descriptors can be used in subsequent image matching and recognition tasks.

2.2 SURF (Speed-up Robust Features)

SURF is a faster feature detection algorithm than SIFT while maintaining similar feature description capabilities. It is very useful for fast and efficient image matching.

2.2.1 SURF principle

The SURF algorithm improves the computational efficiency of SIFT. It uses integral images to quickly calculate the Gaussian Haar wavelet response and find key points at multiple scales.

Scale space construction: Using integral images improves the speed of scale space construction.

Keypoint detection: Use box filters (approximate Gaussian filters) at different scales to find keypoints.

Key point description: Calculate the simplified Haar wave feature descriptor around the key point.

2.2.2 Code implementation

import cv2

img = cv2.imread('tulips.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

surf = cv2.xfeatures2d.SURF_create(1000)
keypoints, descriptors = surf.detectAndCompute(gray, None)

img = cv2.drawKeypoints(gray, keypoints, img,flags=cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS)
cv2.imshow('SURF Features', img)
cv2.waitKey(0)
cv2.destroyAllWindows()

SURF Features

2.2.3 Function analysis

SURF_createFunction used to create a SURF detector object.

def SURF_create(hessianThreshold=None, nOctaves=None, nOctaveLayers=None, extended=None, upright=None)

hessianThreshold: Threshold used for the Hessian keypoint detector. This value determines the selection of keypoints - the higher the threshold, the fewer but more prominent feature points are detected.

nOctaves: Number of pyramid octaves that the keypoint detector will use. Each octave of the pyramid contains a downsampled version of the image.

nOctaveLayers: The number of layers in each octave. This affects the scale sensitivity of feature detection.

extended: Extended descriptor flag (true - use extended 128-element descriptor; false - use 64-element descriptor). Extended descriptors provide more feature information, but also increase computational complexity.

upright: Whether to calculate the direction of the feature (true - do not calculate the direction of the feature; false - calculate the direction). If the image does not exhibit rotational transformation, setting to true can speed up feature detection.

SURF can detect and describe key points in images. These key points and their descriptors can be used for subsequent tasks such as image matching and object recognition. SURF is often used in real-time or resource-constrained application scenarios due to its computational efficiency.

2.3 ORB (Orientation Fast and Rotated Binary)

ORB is an algorithm that combines FAST keypoint detection and BRIEF keypoint description, and is known for its speed and efficiency.

2.3.1 ORB principle

The ORB algorithm provides a fast and effective feature point detection and description method by combining the key point detection of the FAST algorithm and the descriptor of the BRIEF algorithm.

FAST key point detection: Use the FAST algorithm to detect corner points.
BRIEF descriptor: The BRIEF descriptor of key points is calculated in a rotation-invariant way.
Multi-scale features: Repeat the detection process at different scales to ensure scale invariance.

2.3.2 Code implementation

import cv2

img = cv2.imread('tulips.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)

orb = cv2.ORB_create(80)
keypoints, descriptors = orb.detectAndCompute(gray, None)

img = cv2.drawKeypoints(gray, keypoints, img, flags=cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS)
cv2.imshow('ORB Features', img)
cv2.waitKey(0)
cv2.destroyAllWindows()

ORB Features

2.3.3 Function analysis

ORB_createFunction used to create an ORB detector object.

def ORB_create(nfeatures=None, scaleFactor=None, nlevels=None, edgeThreshold=None, firstLevel=None, WTA_K=None, scoreType=None, patchSize=None, fastThreshold=None)

nfeatures: The maximum number of feature points to retain.

scaleFactor: Pyramid downsampling ratio, greater than 1. scaleFactor=2 in the standard pyramid means that the pixels at each level are 1/4 of the previous level.

nlevels: The number of levels in the pyramid. The size of the smallest layer is equal to the input image size divided by scaleFactor raised to the nlevels - firstLevel power.

edgeThreshold: The edge size of the feature is not detected and should roughly match the patchSize parameter.

firstLevel: At which level of the pyramid the original image is placed. The previous layers are filled with the enlarged original image.

WTA_K: Number of points used to generate each BRIEF descriptor element. The default value is 2, which means a simple comparison of brightness.

scoreType: Feature point score type. HARRIS_SCORE uses the Harris algorithm to score feature points; FAST_SCORE is another option that is slightly less stable but faster to calculate.

patchSize: The patch size used to calculate the BRIEF descriptor.

fastThreshold: Threshold of FAST operator.

The use of ORB can quickly and efficiently detect and describe key points in images. These key points and their descriptors can be used for tasks such as image matching and object recognition. The ORB algorithm is particularly popular in mobile and real-time applications due to its efficiency.

3. Feature point matching

Feature point matching is the process of finding the same feature points in different images. In OpenCV, commonly used methods include BF Matcher (Brute-Force Matcher) and FLANN Matcher (Fast Library for Approximate Nearest Neighbors).

3.1 BF matcher

BF matcher is a simple and direct matching method, which mainly calculates the distance between a feature descriptor and all other descriptors, and then selects the matching pair with the smallest distance.

3.1.1 Principle of BF matcher

Distance calculation: For each feature descriptor in a feature set, the BF matcher calculates the distance between it and all feature descriptors in another feature set.

Best matching selection: Then, select the feature pair with the smallest distance as the matching pair.

Distance measure: Commonly used distance measures include Euclidean distance, Hamming distance, etc. The specific distance measure used depends on the type of descriptor.

3.1.2 Code implementation

tulips_template.jpg

import cv2
import numpy as np

# 读取图像
img1 = cv2.imread('tulips.jpg', 0)
img2 = cv2.imread('tulips_template.jpg', 0)
img2 = cv2.rotate(img2,cv2.ROTATE_90_COUNTERCLOCKWISE)
# 初始化SIFT检测器
sift = cv2.xfeatures2d_SIFT.create()

# 使用SIFT找到关键点和描述子
kp1, des1 = sift.detectAndCompute(img1, None)
kp2, des2 = sift.detectAndCompute(img2, None)

# 创建BF匹配器对象
bf = cv2.BFMatcher(cv2.NORM_L2, crossCheck=True)

# 执行匹配
matches = bf.match(des1, des2)

# 绘制匹配
img3 = cv2.drawMatches(img1, kp1, img2, kp2, matches, None, flags=cv2.DRAW_MATCHES_FLAGS_NOT_DRAW_SINGLE_POINTS)

# 展示结果
cv2.imshow('BF Matcher', img3)
cv2.waitKey(0)
cv2.destroyAllWindows()

BF Matches
The example code creates a BFMatcher object with NORM_L2 as the distance measure and crossCheck enabled. Then, use the match method to find the best match between the two sets of descriptors des1 and des2. This matching method is particularly suitable for small or medium-sized data sets and when high-accuracy matching is required.

3.1.3 Function analysis

BFMatcher

BFMatcherIs a Brute-Force Matcher for matching descriptors in different images. It finds matches by calculating the distance between each descriptor. Parameter description is as follows:

def create(cls, normType=None, crossCheck=None)

normType: Specifies the distance metric used to compare descriptors. Commonly used options are NORM_L1, NORM_L2, NORM_HAMMING, NORM_HAMMING2. NORM_L1 and NORM_L2 are suitable for SIFT and SURF
descriptors, while NORM_HAMMING is suitable for ORB , BRISK and BRIEF. NORM_HAMMING2 Applies to ORB's
when WTA_K is 3 or 4.

crossCheck: If set to True, only pairs of descriptors that match each other are returned (i.e. for each query descriptor, in the training description The closest match is found in the descriptor set and, for the found match, is also the closest match in the query descriptor set). This usually results in higher quality matches, but potentially fewer matches.

match

matchMethod is used to find the best match between two sets of descriptors.

def match(self, queryDescriptors, trainDescriptors, mask=None)

queryDescriptors: Query descriptor collection.

trainDescriptors: Collection of training descriptors. This set of descriptors will not be added to the set of training descriptors stored in the class object.

mask: Specifies the mask between query and training descriptors that allow matching. If the query descriptor is masked in the mask, this descriptor will not add a match. Therefore, the number of matches may be smaller than the number of query descriptors.

3.2 FLANN matcher

FLANN matcher is a faster approximate matching method suitable for large-scale data sets. It uses optimized algorithms to quickly find approximate nearest neighbors in test data and training sets.

3.2.1 FLANN matcher principle

Approximate nearest neighbor search: FLANN is a collection based on a variety of optimization algorithms (such as KD tree, hierarchical k-means tree, etc.) and is used to quickly approximate nearest neighbors.

Automatic parameter selection: FLANN can automatically select the most appropriate algorithm and parameters based on data to optimize search efficiency.

3.2.2 Code implementation

import cv2
import numpy as np

img1 = cv2.imread('tulips.jpg', 0)
img2 = cv2.imread('tulips_template.jpg', 0)
img2 = cv2.rotate(img2, cv2.ROTATE_90_COUNTERCLOCKWISE)
# 初始化SIFT检测器
sift = cv2.xfeatures2d_SIFT.create()

# 使用SIFT找到关键点和描述子
kp1, des1 = sift.detectAndCompute(img1, None)
kp2, des2 = sift.detectAndCompute(img2, None)

# FLANN参数
FLANN_INDEX_KDTREE = 1
index_params = dict(algorithm=FLANN_INDEX_KDTREE, trees=5)
search_params = dict(checks=50)

# 创建FLANN匹配器对象
flann = cv2.FlannBasedMatcher(index_params, search_params)

# 执行匹配
matches = flann.knnMatch(des1, des2, k=2)

# 仅保留好的匹配
good_matches = []
for m, n in matches:
    if m.distance < 0.7 * n.distance:
        good_matches.append(m)

# 绘制匹配
img3 = cv2.drawMatches(img1, kp1, img2, kp2, good_matches, None, flags=2)

# 展示结果
cv2.imshow('FLANN Matcher', img3)
cv2.waitKey(0)
cv2.destroyAllWindows()

FLAN matches
The example code creates a FlannBasedMatcher object using the KDTREE algorithm and associated parameters. Then, use the knnMatch method to find the 2 nearest neighbors of each descriptor between the two sets of descriptors des1 and des2 ( That is k=2). This method is particularly suitable for processing large data sets and can achieve better performance and efficiency when finding approximate nearest neighbors.

3.2.3 Function analysis

FlannBasedMatcher

FlannBasedMatcher is the class used for feature matching. Compared to BFMatcher (brute force matcher), it is more efficient when processing large amounts of data, especially when the number of descriptors is large. The parameter description is as follows:

index_params: Parameter in dictionary format, used to specify index parameters. These parameters will vary for different algorithms. For example, when using the KDTREE algorithm, you can set the number of trees.

search_params: Parameters in dictionary format, used to specify search parameters. For example, the checks parameter controls the number of iterations during search, affecting the accuracy and efficiency of the search.

knnMatch

knnMatchMethod is used to find the k best matches for each query descriptor. The parameters are as follows:

queryDescriptors: Query descriptor collection.

trainDescriptors: Collection of training descriptors. This set of descriptors will not be added to the set of training descriptors stored in the class object.

k: The number of nearest neighbors to be found for each query descriptor.

mask: A mask specifying allowed matches between query and training descriptors.

compactResult: Parameter used when the mask is not empty. If False, the returned match vector has the same number of rows as the query descriptor. If True, no matches are included for query descriptors that are completely masked.

3.3 RANSAC feature point matching

RANSAC (Random Sampling Consensus Algorithm) is a robust feature point matching algorithm that is widely used to process data containing a large amount of noise. During the feature point matching process, RANSAC can effectively identify correct matching point pairs (inner points) and eliminate incorrect matches (outer points).

3.3.1 RANSAC principle

Random sampling: Randomly select a small subset from all matching point pairs to estimate the transformation model. For example, when calculating the homography between two images, 4 pairs of matching points are usually selected.

Model estimation: Use this subset to compute the transformation model. For example, if the homography matrix is calculated, this step will produce a homography matrix H.

Inlier Count: Test all data points using the estimated model and count the number of points (inliers) for which the model is consistent. The set of these points is called a consistency set.

Model verification: Repeat the above process a fixed number of times. After each iteration, if the size of the consistency set exceeds the previous maximum value, the best model is updated to be the current model.

Optimal model: Finally, use the set of interior points to re-estimate the optimal model.

3.3.2 Code implementation

The following is a code example for RANSAC feature point matching using OpenCV.

import cv2
import numpy as np

# 读取图像
img1 = cv2.imread('tulips.jpg', 0)
img2 = cv2.imread('tulips_template.jpg', 0)
img2 = cv2.rotate(img2, cv2.ROTATE_90_COUNTERCLOCKWISE)

# 初始化SIFT检测器
sift = cv2.xfeatures2d_SIFT.create()

# 使用SIFT找到关键点和描述子
kp1, des1 = sift.detectAndCompute(img1, None)
kp2, des2 = sift.detectAndCompute(img2, None)

# 创建FLANN匹配器对象
flann = cv2.FlannBasedMatcher(dict(algorithm=1, trees=5), dict(checks=50))

matches = flann.knnMatch(des1, des2, k=2)

# 筛选好的匹配点
good_matches = []
for m, n in matches:
    if m.distance < 0.7 * n.distance:
        good_matches.append(m)

# 提取匹配点的位置
src_pts = np.float32([kp1[m.queryIdx].pt for m in good_matches]).reshape(-1, 1, 2)
dst_pts = np.float32([kp2[m.trainIdx].pt for m in good_matches]).reshape(-1, 1, 2)

# 使用RANSAC找到单应性矩阵
M, mask = cv2.findHomography(src_pts, dst_pts, cv2.RANSAC, 5.0)

matchesMask = mask.ravel().tolist()

# 绘制匹配结果
draw_params = dict(matchColor=(0, 255, 0), singlePointColor=None, matchesMask=matchesMask, flags=2)
img3 = cv2.drawMatches(img1, kp1, img2, kp2, good_matches, None, **draw_params)

# 展示结果
cv2.imshow('RANSAC Matcher', img3)
cv2.waitKey(0)
cv2.destroyAllWindows()

In this code, the SIFT algorithm is used to detect feature points and matched through the FLANN matcher. Then, use the cv2.findHomography function combined with the RANSAC algorithm to find the optimal homography matrix. Among them, src_pts and dst_pts are matching point pairs, and 5.0 is the threshold of RANSAC reprojection error. Using the RANSAC method can effectively handle outliers or outliers and improve the accuracy of homography matrix estimation.
RANSAC Matcher

3.3.3 Function analysis

findHomographyFunction Method used to find the perspective transformation (homography matrix) between two planes. This function is commonly used in image registration, 3D reconstruction, and many other applications in computer vision. The following is the parameter description of this function:

def findHomography(srcPoints, dstPoints, method=None, ransacReprojThreshold=None, mask=None, maxIters=None, confidence=None)

srcPoints: The coordinates of the midpoint of the original plane. This can be a matrix of type CV_32FC2 or vector<Point2f>.

dstPoints: The coordinates of the midpoint of the target plane, the format is the same as srcPoints.

method: Method for calculating homography matrix. Available methods include:

0: Use ordinary least squares for all points.

cv2.RANSAC: Robust method based on RANSAC.

cv2.LMEDS: Minimum median robust method.

cv2.RHO: Robust method based on PROSAC.

ransacReprojThreshold (optional, only when using RANSAC or RHO): Maximum reprojection error allowed for treating pairs of points as inliers. When measured in pixels, this value is typically set in the range 1 to 10.

mask: Optional output mask set by a robust method (RANSAC or LMEDS). Non-zero values in the mask represent inliers.

maxIters: The maximum number of iterations of RANSAC.

confidence: Confidence level, between 0 and 1.

Function return value:

retval: Homography matrix, if it cannot be estimated, an empty matrix is returned.

mask: Output mask, identifying whether each point pair is an inlier or an outlier.

Summarize

In this blog, we explore in detail the various methods used for feature point detection and matching in OpenCV. From basic corner point detection to complex feature point description and matching, these methods play a vital role in dealing with practical computer vision problems. By analyzing the principles, formulas and code implementation of each method, we can better understand their respective advantages and applicable scenarios.