Image processing Image feature extraction and description

1. Corner features

1. Features of the image

Most of us have played jigsaw puzzles. First get the fragments of the complete image, and then arrange these fragments in the correct way to reconstruct the image. If the principle of jigsaw puzzles is written into a computer program, then the computer can also play jigsaw puzzles.

When doing puzzles, we are looking for some unique features that are suitable for tracking and easy to compare. We search for such features in one image, find them, and can also find these features in other images, and then stitch them together. We are all born with these abilities.

What are these characteristics? We hope that these features can also be understood by computers.

If we look deeply into some images and search for different regions, the following image is an example:

insert image description here

Six subplots are given above the image. Find where these thumbnails are in the original image. How many correct results can you find?

A and B are planes, and their images exist in many places. It is difficult to find the exact location of these small images.

C and D are also simple. They are the edges of the building. Their approximate locations can be found, but their exact locations are still difficult to find. That's because: along the edges, everything is the same everywhere. So edges are a better feature than planes, but not good enough.

Finally E and F are some corners of the building. They can be found easily. Because at the corners, no matter which direction you move the thumbnail, the result will be very different. So think of them as a good feature. To better understand this concept let's take a simpler example.

insert image description here

As shown in the image above, the area in the blue box is a plane that is difficult to find and track. No matter which direction you move the blue box, it's the same. For the area in the black box, it is an edge. It changes if moved vertically. But it doesn't change if you move horizontally. As for the corner point in the red box, no matter which direction you move, the results are different, which means it is unique. Therefore, we say that the corner point is a good image feature, which answers the previous question.

Corner is a very important feature of image, which plays a very important role in the understanding and analysis of image graphics. Corner points play a very important role in computer vision fields such as 3D scene reconstruction motion estimation, target tracking, target recognition, image registration and matching. In the real world, corner points correspond to corners of objects, road intersections, T-junctions, etc.

So how do we find these corners? Next we use various algorithms in OpenCV to find the features of the image and describe them.

2. Harris and Shi-Tomas algorithm

1. Harris corner detection

1.1 Principle

The idea of ​​Harris corner detection is to observe the image through a small local window of the image. The feature of the corner is that the window moves in any direction, which will cause obvious changes in the gray level of the image, as shown in the following figure:

insert image description here

Transform the above idea into a mathematical form, that is, move the local window in all directions ( u , v ) (u,v)(u,v ) and calculate the sum of all gray differences, the expression is as follows:
insert image description here
whereI ( x , y ) I(x,y)I(x,y ) is the image grayscale of the local window,I ( x + u , y + v ) I(x+u,y+v)I(x+u,y+v ) is the image grayscale after translation,w ( x , y ) w(x,y)w(x,y ) is a window function, which can be a rectangular window or a Gaussian window that assigns different weights to each pixel, as shown below:E ( u , v ) E(u,v)
insert image description here
in corner detectionE ( u ,v ) has the largest value. Using the first-order Taylor expansion:
insert image description here
whereI x I_xIxSum I y I_yIyis the derivative along the x and y directions, which can be calculated by the sobel operator.

The derivation is as follows:

insert image description here

M M The M matrix determinesE ( u , v ) E(u,v)E ( u ,v ) value, below we useMMM to find corner points,MMM isI x I_xIxSum I y I_yIyThe quadratic function of the ellipse can be expressed as an ellipse, and the semi-axis of the ellipse is determined by MMEigenvalueλ 1 \lambda_1 of Ml1and λ 2 \lambda_2l2Determined, the direction is determined by the eigenvector, as shown in the figure below:
insert image description here
The relationship between the elliptic function eigenvalues ​​and the corner points, straight lines (edges) and planes in the image is shown in the figure below.

insert image description here

There are three situations:

  • straight lines in the image. One eigenvalue is large and the other eigenvalue is small, λ1 >> λ2 or λ2 >> λ1. Elliptic function values ​​are large in one direction and small in other directions.
  • The plane in the image. Both eigenvalues ​​are small and approximately equal; elliptic function values ​​are small in all directions.
  • corners in the image. Both eigenvalues ​​are large and approximately equal, and the elliptic function increases in all directions

The corner point calculation method given by Harris does not need to calculate specific eigenvalues, but calculates a corner point response value RR to judge the corner point. The calculation formula of RR is:
insert image description here
in the formula, detM is the determinant of matrix M; traceM is the trace of matrix M; α is a constant, and its value ranges from 0.04 to 0.06. In fact, signatures are implicit in detM and traceM because:

insert image description here

So how do we judge corners? As shown below:

insert image description here

  • When R is a positive number with a large value, it is a corner point
  • Boundary when R is negative for large values
  • When R is a decimal, it is considered to be a flat area

1.2 Implementation

The API used to implement Hariis detection in OpenCV is:

dst=cv.cornerHarris(src, blockSize, ksize, k)

Parameters :

  • img: input image with data type float32.
  • blockSize: The size of the neighborhood to consider in corner detection.
  • ksize: the kernel size used for sobel derivation
  • k : Free parameter in the corner detection equation, the value parameter is [0.04, 0.06].

Example :

import cv2 as cv
import numpy as np 
import matplotlib.pyplot as plt
# 1 读取图像,并转换成灰度图像
img = cv.imread('./image/chessboard.jpg')
gray = cv.cvtColor(img, cv.COLOR_BGR2GRAY)
# 2 角点检测
# 2.1 输入图像必须是 float32
gray = np.float32(gray)

# 2.2 最后一个参数在 0.04 到 0.05 之间
dst = cv.cornerHarris(gray,2,3,0.04)
# 3 设置阈值,将角点绘制出来,阈值根据图像进行选择
img[dst>0.001*dst.max()] = [0,0,255]
# 4 图像显示
plt.figure(figsize=(10,8),dpi=100)
plt.imshow(img[:,:,::-1]),plt.title('Harris角点检测')
plt.xticks([]), plt.yticks([])
plt.show()

The result is as follows:

insert image description here

Advantages and disadvantages of Harris corner detection:

advantage:

  • Rotation invariance, the ellipse rotates through a certain angle but its shape remains unchanged (eigenvalues ​​remain unchanged)
  • The affine change of image grayscale has partial invariance. Since only one mediation number of the image is used, the translation change of image grayscale is unchanged; the change of image grayscale scale is unchanged.

shortcoming:

  • It is sensitive to scale and does not have geometric scale invariance.
  • Extracted corners are pixel-level

2. Shi-Tomasi corner detection

2.1 Principle

The Shi-Tomasi algorithm is an improvement to the Harris corner detection algorithm, and generally gets better corners than the Harris algorithm. The corner response function of the Harris algorithm is to subtract the determinant value of the matrix M from the trace of M, and use the difference to judge whether it is a corner. Later, Shi and Tomasi proposed an improved method, if the smaller of the two eigenvalues ​​of the matrix M is greater than the threshold, it is considered to be a corner point, namely:

insert image description here
As shown below:

insert image description here

From this figure, it can be seen that only when λ1 and λ2 are larger than the minimum value, it is considered as a corner point.

2.2 Implementation

Implement Shi-Tomasi corner detection in OpenCV using API:

corners = cv2.goodFeaturesToTrack ( image, maxcorners, qualityLevel, minDistance )

parameter:

  • Image: input grayscale image
  • maxCorners : Get the number of corners.
  • qualityLevel: This parameter indicates the minimum acceptable corner quality level, between 0-1.
  • minDistance: The minimum Euclidean distance between corner points to avoid getting adjacent feature points.

return:

  • Corners: The searched corner points, where all the corner points below the quality level are excluded, and then the qualified corner points are sorted by quality, and then the corner points near the corner points with better quality (less than the minimum Euclidean distance) are sorted Delete, and finally find maxCorners corner points and return.

Example:

import numpy as np 
import cv2 as cv
import matplotlib.pyplot as plt
# 1 读取图像
img = cv.imread('./image/tv.jpg') 
gray = cv.cvtColor(img,cv.COLOR_BGR2GRAY)
# 2 角点检测
corners = cv.goodFeaturesToTrack(gray,1000,0.01,10)  
# 3 绘制角点
for i in corners:
    x,y = i.ravel()
    cv.circle(img,(x,y),2,(0,0,255),-1)
# 4 图像展示
plt.figure(figsize=(10,8),dpi=100)
plt.imshow(img[:,:,::-1]),plt.title('shi-tomasi角点检测')
plt.xticks([]), plt.yticks([])
plt.show()

The result is as follows:

insert image description here

3. SIFT/SURF algorithm

1.1 Principle of SIFT

In the previous two sections, we introduced the Harris and Shi-Tomasi corner detection algorithms. These two algorithms have rotation invariance, but not scale invariance. Take the following figure as an example. Corner points can be detected in the small picture on the left, but After the image is enlarged, using the same window, the corners cannot be detected.

insert image description here

So, let's introduce a computer vision algorithm, scale-invariant feature transform or SIFT (Scale-invariant feature transform). It is used to detect and describe local features in images. It looks for extreme points in the spatial scale and extracts its position, scale, and rotation invariants. This algorithm was published by David Lowe in 1999 and perfected in 2004. Summarize. Applications include object recognition, robot map perception and navigation, image stitching, 3D model building, gesture recognition, image tracking, and motion comparison.

The essence of the SIFT algorithm is to find key points (feature points) in different scale spaces and calculate the direction of the key points. The key points found by SIFT are some very prominent points that will not change due to factors such as illumination, affine transformation and noise, such as corner points, edge points, bright spots in dark areas, and dark points in bright areas.

1.1.1 Basic process

Lowe decomposes the SIFT algorithm into the following four steps:

Scale-space extrema detection: Searches for image locations at all scales. Potential keypoints that are invariant to scale and rotation are identified by a Gaussian difference function.
Keypoint positioning: At each candidate location, a fine-fitting model is used to determine the location and scale. Keypoints are chosen according to their stability.
Keypoint orientation determination: assign one or more orientations to each keypoint position based on the local gradient orientation of the image. All subsequent operations on the image data are transformed relative to the orientation, scale, and position of the keypoints, thus ensuring invariance to these transformations.
Keypoint description: In the neighborhood around each keypoint, the local gradient of the image is measured at a selected scale. These gradients serve as keypoint descriptors, which allow relatively large local shape deformations or lighting changes.
Let's follow Lowe's steps to introduce the implementation process of the SIFT algorithm:

1.1.2 Extremum detection in scale space

It is not possible to use the same window to detect extreme points in different scale spaces, use a small window for small key points, and use a large window for large key points. In order to achieve the above goals, we use scale space filters.

The Gaussian kernel is the only kernel function that can generate a multi-scale space. - "Scale-space theory: A basic tool for analyzing structures at different scales".

The scale space L(x,y,σ) of an image is defined as the convolution operation of the original image I(x,y) with a variable-scale 2-dimensional Gaussian function G(x,y,σ), namely: where
insert image description here
:
insert image description here
σ is the scale space factor, which determines the degree of blurring of the image. On large scales ( σ \sigmaσ value is large) shows the general information of the image, at a small scale (σ \sigmaσ value is small) shows the detailed information of the image.

When calculating the discrete approximation of the Gaussian function, the pixels outside the approximate 3σ distance can be regarded as ineffective, and the calculation of these pixels can be ignored. Therefore, in practical applications, only calculating the Gaussian convolution kernel of (6σ+1)*(6σ+1) can guarantee the influence of relevant pixels.
Next, we construct the Gaussian pyramid of the image, which is obtained by blurring and downsampling the image with the Gaussian function. In the process of Gaussian pyramid construction, the image is first doubled, and the Gaussian pyramid is constructed on the basis of the enlarged image, and then the The image under this size is Gaussian blurred, and several blurred images form an Octave, and then select an image under the Octave to downsample, the length and width are shortened by one time, and the image area becomes a quarter of the original . This image is the initial image of the next Octave. On the basis of the initial image, the Gaussian blur processing belonging to this Octave is completed, and so on to complete all the octave constructions required by the entire algorithm, so that the Gaussian pyramid is constructed. The entire The process is shown in the figure below:
insert image description here
using LoG (Laplacian of Gaussian method), that is, the second derivative of the image, the key point information of the image can be detected at different scales to determine the feature points of the image. However, LoG is computationally intensive and inefficient. So we obtain DoG (difference of Gaussian) to approximate LoG by subtracting images of two adjacent Gaussian scale spaces.
In order to calculate DoG, we build a Gaussian difference pyramid, which is built on the basis of the above-mentioned Gaussian pyramid. The establishment process is: in the Gaussian pyramid, the subtraction of two adjacent layers in each Octave constitutes a Gaussian difference pyramid. As shown below:
insert image description here

The first group and the first layer of the Gaussian difference pyramid are obtained by subtracting the first group and the first layer from the first group and the second layer of the Gaussian pyramid. By analogy, each difference image is generated group by group and layer by layer, and all difference images form a difference pyramid. It is summarized that the image of the oth group l layer of the DOG pyramid is obtained by subtracting the oth group l layer from the oth group l+1 layer of the Gaussian pyramid. Subsequent extraction of Sift feature points is performed on the DOG pyramid
. After the DoG is completed, local maxima can be searched in different scale spaces. For a pixel in the image, it needs to be compared with the 8 neighbors around it and the adjacent 18 (2x9) points in the upper and lower layers in the scale space. If it's a local maximum, it might be a key point. Basically the keypoint is the best representation of the image in the corresponding scale space. As shown below:

insert image description here

The search process starts from the second layer of each group, takes the second layer as the current layer, and takes a 3×3 cube for each point in the DoG image of the second layer, and the upper and lower layers of the cube are the first layer and the third layer . In this way, the searched extreme points have both position coordinates (DoG image coordinates) and spatial scale coordinates (layer coordinates). After the second layer search is completed, the third layer is used as the current layer, and the process is similar to the second layer search. When S=3, there are 3 layers to be searched in each group, so there are S+2 layers in the DOG, and each group has S+3 layers in the pyramid constructed at the beginning.

1.1.3 Key point positioning

Since DoG is sensitive to noise and edges, the local extremum points detected in the above Gaussian difference pyramid need further inspection before they can be accurately positioned as feature points.

The exact position of the extremum is obtained by using the Taylor series expansion of the scale space. If the gray value of the extremum point is less than the threshold (usually 0.03 or 0.04), it will be ignored. This threshold is called contrastThreshold in OpenCV.

The DoG algorithm is very sensitive to the boundary, so we have to remove the boundary. In addition to being used for corner detection, the Harris algorithm can also be used for detecting boundaries. From the algorithm of Harris corner detection, when one eigenvalue is much larger than another eigenvalue, it is detected as a boundary. The key points that are not good in the DoG algorithm have a larger main curvature in the direction parallel to the edge, and a smaller curvature in the direction perpendicular to the edge. If the ratio of the two is higher than a certain threshold (called in OpenCV Boundary threshold), the key point is considered to be a boundary and will be ignored. Generally, the threshold is set to 10.

The key points of low contrast and boundaries are removed, and the key points we are interested in are obtained.

1.1.4 Key point direction determination

After the above two steps, the key points of the image are completely found, and these key points are scale invariant. In order to achieve rotation invariance, it is also necessary to assign a direction angle to each key point, that is, to obtain a direction reference based on the neighborhood structure of the Gaussian scale image where the detected key point is located.

For any key point, we collect the gradient features (magnitude and argument) of all pixels in the Gaussian pyramid image where the radius is r, and the radius r is:

insert image description here
Where σ is the scale of the octave image where the key point is located, and the corresponding scale image can be obtained.

The formulas for calculating the magnitude and direction of the gradient are:

insert image description here

The calculation result of the neighborhood pixel gradient is shown in the figure below:

insert image description here

After completing the key point gradient calculation, use the histogram to count the gradient magnitude and direction of the pixels in the key point neighborhood. The specific method is to divide 360° into 36 columns, and every 10° is a column, and then in the area with r as the radius, find out the pixels whose gradient direction is in a certain column, and then add their amplitudes together as the height of the column. Because the contribution of the gradient magnitude of the pixel to the central pixel is different in the area where r is the radius, it is also necessary to weight the magnitude, using Gaussian weighting with a variance of 1.5σ. As shown in the figure below, only the histograms in 8 directions are drawn in order to simplify the figure.

insert image description here

Each feature point must be assigned a main direction, and one or more auxiliary directions are required. The purpose of adding auxiliary directions is to enhance the robustness of image matching. The definition of the auxiliary direction is that when the height of a cylinder is greater than 80% of the height of the main direction cylinder, the direction represented by the cylinder is the auxiliary direction for the feature point.
The peak value of the histogram, that is, the direction represented by the highest column is the main direction of the image gradient within the neighborhood of the feature point, but the angle represented by the column is a range, so we need to interpolate and fit the discrete histogram, In order to get a more accurate direction angle value. Use a parabola to fit the discrete histogram, as shown in the following figure:

insert image description here

After obtaining the main direction of the key points of the image, each key point has three pieces of information (x, y, σ, θ): position, scale, and direction. From this we can determine a SIFT feature region. Usually use a circle with an arrow or directly use the arrow to indicate the three values ​​​​of the SIFT area: the center indicates the feature point position, the radius indicates the key point scale, and the arrow indicates the direction. As shown below:

insert image description here

1.1.5 Description of key points

Through the above steps, each key point is assigned position, scale and direction information. Next we build a descriptor for each keypoint that is both discriminative and invariant to certain variables such as lighting, viewpoint, etc. And the descriptor not only contains the key points, but also includes the pixels around the key points that contribute to it. The main idea is to divide the image area around the key point into blocks, calculate the gradient histogram in the block, generate a feature vector, and abstract the image information.

The descriptor is related to the scale of the feature point, so we generate the corresponding descriptor on the Gaussian scale image where the key point is located. Taking the feature point as the center, divide its nearby neighborhood into d ∗ dd*ddd sub-regions (generally take d=4), each sub-region is a square with a side length of 3σ, considering that in actual calculation, cubic linear interpolation is required, so the feature point neighborhood is 3 σ ( d + 1) ∗ 3 σ ( d + 1 ) 3\sigma(d+1)*3\sigma(d+1)3σ(d+1)3σ(d+1 ) range, as shown in the figure below:

insert image description here
In order to ensure the rotation invariance of the feature points, take the feature point as the center, and rotate the coordinate axis to the main direction of the key point, as shown in the following figure:

insert image description here

Calculate the gradient of the pixels in the sub-region, and perform Gaussian weighting according to σ=0.5d, and then interpolate to obtain the gradient in eight directions of each seed point. The interpolation method is shown in the figure below:

insert image description here

The gradient of each seed point is interpolated by the 4 subregions covering it. The red dot in the figure falls between row 0 and row 1 and contributes to both rows. The contribution factor to the seed point in row 0, column 3 is dr, and the contribution factor to row 1, column 3 is 1-dr. Similarly, the contribution factors to the two adjacent columns are dc and 1-dc, and the contribution factors to the two adjacent columns are dc and 1-dc. The contribution factors of each direction are do and 1-do. Then the final accumulated gradient size in each direction is:
insert image description here
where k, m, n are 0 or 1. Statistics above 4 ∗ 4 ∗ 8 = 128 4*4*8=128448=The 128 gradient information is the feature vector of the key point, and the feature vector of each key point is sorted according to the feature point, and the SIFT feature description vector is obtained.

1.1.6 Summary

SIFT has unparalleled advantages in image invariant feature extraction, but it is not perfect. There are still defects such as low real-time performance, sometimes fewer feature points, and inability to accurately extract feature points for objects with smooth edges. Since the advent of the SIFT algorithm, , people have been optimizing and improving it, the most famous of which is the SURF algorithm.

1.2 Principle of SURF

The execution speed of keypoint detection and description using SIFT algorithm is relatively slow, and a faster algorithm is required. In 2006, Bay proposed the SURF algorithm, which is an enhanced version of the SIFT algorithm. It has a small amount of calculation and a fast operation speed. The extracted features are almost the same as SIFT. The comparison between it and the SIFT algorithm is as follows:

insert image description here

1.3 Implementation

The process of using SIFT to detect key points in OpenCV is as follows:

1. Instantiate sift

sift = cv.xfeatures2d.SIFT_create()

2. Use sift.detectAndCompute() to detect key points and calculate

kp,des = sift.detectAndCompute(gray,None)

parameter:

  • gray: The image for key point detection, note that it is a grayscale image
    Return:
  • kp: key point information, including position, scale, direction information
  • des: key point descriptor, each key point corresponds to 128 feature vectors of gradient information

3. Draw the key point detection results on the image

cv.drawKeypoints(image, keypoints, outputimage, color, flags)

parameter:

  • image: original image
  • keypoints: key point information, draw it on the image
  • outputimage: output image, which can be the original image
  • color: color setting, change the color of the brush by modifying the value of (b, g, r), b=blue, g=green, r=red.
  • flags: flag setting for drawing function
    • cv2.DRAW_MATCHES_FLAGS_DEFAULT: Create an output image matrix, use the existing output image to draw matching pairs and feature points, and only draw intermediate points for each key point
    • cv2.DRAW_MATCHES_FLAGS_DRAW_OVER_OUTIMG: Do not create output image matrix, but draw matching pairs on output image
    • cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS: Draw a key point graphic with size and direction for each feature point
    • cv2.DRAW_MATCHES_FLAGS_NOT_DRAW_SINGLE_POINTS: The feature points of a single point are not drawn

The application of the SURF algorithm is consistent with the above process, so it will not be repeated here.

Example :

Use the SIFT algorithm to detect key points on CCTV pictures and draw them:

import cv2 as cv 
import numpy as np
import matplotlib.pyplot as plt
# 1 读取图像
img = cv.imread('./image/tv.jpg')
gray= cv.cvtColor(img,cv.COLOR_BGR2GRAY)
# 2 sift关键点检测
# 2.1 实例化sift对象
sift = cv.xfeatures2d.SIFT_create()

# 2.2 关键点检测:kp关键点信息包括方向,尺度,位置信息,des是关键点的描述符
kp,des=sift.detectAndCompute(gray,None)
# 2.3 在图像上绘制关键点的检测结果
cv.drawKeypoints(img,kp,img,flags=cv.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS)
# 3 图像显示
plt.figure(figsize=(8,6),dpi=100)
plt.imshow(img[:,:,::-1]),plt.title('sift检测')
plt.xticks([]), plt.yticks([])
plt.show()

result:

insert image description here

Guess you like

Origin blog.csdn.net/mengxianglong123/article/details/125931519