Lesson4-3:OpenCV image feature extraction and description---SIFT/SURF algorithm

learning target

  • 理解 S I F T / S U R F SIFT/SURF Principle of S I FT / S U RF algorithm,
  • Ability to use SIFT/SURF SIFT/SURFS I FT / S U RF detect key points

SIFT/SURF algorithm

1.1 SIFT principle

In the previous two sections we introduced Harris HarrisH a rr i sS hi − T omasi Shi-TomasiShiT o ma s i corner point detection algorithm. These two algorithms have rotation invariance but not scale invariance. Take the following figure as an example. Corners can be detected in the small image on the left, but after the image is enlarged, the Using the same window, corner points cannot be detected.
Insert image description here
Therefore, below we will introduce a computer vision algorithm, scale-invariant feature transform, namelySIFT (Scale − invariantfeaturetrans form) SIFT (Scale-invariant feature transform)SIFT(Scalein v a r ian t f e a t u re t r an s f or m ) . It is used to detect and describe local features in images. It finds extreme points in the spatial scale and extracts their position, scale, and rotation invariants. This algorithm was published by David Lowe in 1999 and perfected in 2004. Summarize. The application scope includes object recognition, robot map perception and navigation, image stitching, 3D model building, gesture recognition, image tracking and action comparison, etc.

The essence of the SIFT algorithm is to find key points (feature points) on different scale spaces and calculate the direction of the key points. The key points found by SIFT are some very prominent points that will not change due to factors such as lighting, affine transformation, and noise, such as corner points, edge points, bright spots in dark areas, and dark points in bright areas.

1.1.1 Basic process

L owe LoweLowe S I F T SIFT The S I FT algorithm is decomposed into the following four steps:

  1. Scale space extreme value detection: Search image locations at all scales. Potential key points that are invariant to scale and rotation are identified through Gaussian difference functions.
  2. Key point positioning: At each candidate position, the position and scale are determined through a well-fitted model. Key points are chosen based on their stability.
  3. Key point direction determination: Based on the local gradient direction of the image, one or more directions are assigned to each key point position. All subsequent operations on the image data are transformed relative to the direction, scale and position of the key points, thus ensuring invariance to these transformations.
  4. Keypoint description: In the neighborhood around each keypoint, the local gradient of the image is measured at a selected scale. These gradients serve as keypoint descriptors, which allow relatively large local shape deformations or illumination changes.

Let's go along Lowe LoweLow steps for SIFTSIFT _The implementation process of SIFT algorithm is introduced:

1.1.2 Scale space extreme value detection

In different scale spaces, the same window cannot be used to detect extreme points. Small windows are used for small key points, and large windows are used for large key points. In order to achieve the above purpose, we use scale space filters.

Gaussian kernel is the only kernel function that can generate multi-scale space. -"Scale-space theory: A basic tool for analyzing structures at different scales".

The scale space L ( x , y , σ ) of an image L(x,y,σ)L(x,y,σ ) , defined as the original imageI ( x , y ) I(x,y)I(x,y ) with a variable scale of2 22- dimensional Gaussian functionG ( x , y , σ ) G(x,y,σ)G(x,y,σ ) convolution operation, that is:

L ( x , y , σ ) = G ( x , y , σ ) ∗ I ( x , y ) L(x,y,σ)=G(x,y,σ)∗I(x,y)L(x,y,s )=G(x,y,s )I(x,y)

in:

G ( x , y , σ ) = 1 2 π σ 2 ex 2 + y 2 2 σ 2 G(x,y,σ)= \frac{1}{2πσ^{2}}e^{\frac{x ^{2}+y^{2}}{2σ^{2}}}G(x,y,s )=2 p.s _21e2 p2x2+y2

SSσ is the scale space factor, which determines the degree of blur of the image. At large scales (σ σ(large σ value) represents the general information of the image. In small scale (σ σ(small σ value) represents the detailed information of the image.

When computing the discrete approximation of a Gaussian function, at approximately 3σ 3σPixels outside the 3σ distance can be regarded as ineffective, and the calculation of these pixels can be ignored . Therefore, in practical applications, only calculate( 6 σ + 1 ) ∗ ( 6 σ + 1 ) (6σ+1)*(6σ+1)( 6 p+1)( 6 p+The Gaussian convolution kernel of 1 ) can ensure the influence of relevant pixels.

Next, we construct a Gaussian pyramid of the image, which is obtained by blurring and downsampling the image using a Gaussian function. During the construction of the Gaussian pyramid, the image is first doubled, a Gaussian pyramid is constructed based on the expanded image, and then the Gaussian pyramid is constructed. The image at this size is subjected to Gaussian blur, and the collection of several blurred images constitutes an Octave OctaveO c t a v e , and thenOctaveSelect an image under Octave for downsampling . The length and width are doubled respectively, and the image area becomes a quarter of the original one . This image is the nextOctave OctaveThe initial image of Octave is completed based on the initial image and belongs to this Octave OctaveGaussian blur processing of O c t a v e , and so on to complete all the octave construction required by the entire algorithm, so that the Gaussian pyramid is constructed. The entire process is shown in the figure below:

Insert image description here
Utilize L o G LoGL o G (Laplacian of Gaussian method), which is the second derivative of the image, can detect key point information of the image at different scales to determine the feature points of the image. ButL o G LoGL o G has a large amount of calculation and low efficiency. So we get D o G DoGby subtracting the images of two adjacent Gaussian scale spaces.Do G (difference of Gaussians) to approximateL o G LoGLoG

To calculate D o G DoGDo GWe build a Gaussian difference pyramid, which is built on the basis of the above-mentioned Gaussian pyramid. The establishment process is: each Octave Octavein the Gaussian pyramidThe subtraction of two adjacent layers in Octave forms a Gaussian difference pyramid . As shown below:Insert image description here

Gaussian Difference Pyramid No. 1 1Group 1 No. 1 1Level 1 is obtained by subtracting level 1 of group 1 from level 2 of group 1 of the Gaussian pyramid. By analogy, each differential image is generated group by group and layer by layer, and all differential images constitute a differential pyramid. Summed up asDOG DOGD OG Pyramid No.ooo Group No.1The l- layer image is the ooth layerwith Gaussian pyramidGroup o No.l + 1 l+1l+1st floor minusootho Group No.1Obtained from layer l . Follow-upSi ift SiftS if t feature points are extracted inDOG DOGConducted on D OG Pyramid

In D o G DoGAfter Do G is completed, local maxima can be searched in different scale spaces. For a pixel in the image, it needs to be consistent withthe 8 88 neighborhood, and adjacent18 (2 x 9) 18 (2x9)Compared to 18 ( 2 x 9 ) points. If it is a local maximum, it may be a critical point. Basically the keypoint is the best representation of the image in the corresponding scale space. As shown in the figure below:
Insert image description here
the search process starts from the second layer of each group, with the second layer as the current layer, and theD o G DoGEach point in the Do G image takes a 3 × 3 3 × 33×The cube of 3 , the upper and lower layers of the cube are the first and third layers. In this way, the extreme points obtained through the search have position coordinates (D o G DoGDo G 's image coordinates), and spatial scale coordinates (layer coordinates). After the second layer search is completed, the third layer is used as the current layer. The process is similar to the second layer search. WhenS = 3 S=3S=3 , search for3 33 floors, so inDOG DOGThere is S + 2 S+2in D OGS+Level 2 , each group has S + 3 S + 3in the pyramid built by the first userS+3 floors.

1.1.3 Key point positioning

Due to D o G DoGDo G is sensitive to noise and edges, so the local extreme points detected in the Gaussian difference pyramid above need to be further inspected before they can be accurately positioned as feature points.

Use the Taylor series expansion of the scale space to obtain the exact location of the extreme value. If the gray value of the extreme point is less than the threshold (usually 0.03 0.030.03 or0.04 0.040.04 ) will be ignored. In OpenCV this kind of threshold is calledcontrastThreshold.

D o G DoG The Do G algorithm is very sensitive to boundaries, so we must remove the boundaries. H arris HarrisIn addition to corner point detection, H a r i s algorithm can also be used to detect boundaries. FromHarris HarrisIn the H arris corner detection algorithm, a boundary is detected when one feature value is much larger than another feature value . That's inD o G DoGThe poor key points in the Do G algorithm have a larger principal curvature in the direction parallel to the edge and a smaller curvature in the direction perpendicular to the edge. If the ratio between the two is higher than a certain threshold (called the boundary in OpenCV threshold), the key point is considered to be a boundary and will be ignored. Generally, the threshold is set to10 1010

By removing low contrast and boundary key points, we get the key points we are interested in.

1.1.4 Determining the direction of key points

After the above two steps, the key points of the image have been completely found, and these key points have scale invariance. In order to achieve rotation invariance, it is also necessary to assign a direction angle to each key point, that is, to obtain a direction reference based on the neighborhood structure of the Gaussian scale image where the detected key point is located.

For any key point, we collect the gradient features (amplitude and angle) of all pixels in the area of ​​the Gaussian pyramid image with r as the radius, the radius rrr的:
r = 3 × 1.5σ r=3×1.5σr=3×1.5 p

where σ is the key point octave octaveThe scale of the image of oc t a v e can be used to obtain the corresponding scale image.

The calculation formula for the magnitude and direction of the gradient is:

m (x, y) = (L (x + 1, y) − L (x − 1, y) 2 + (L (x, y + 1) − L (x, y − 1)) 2 m(x ,y)=\sqrt{(L(x+1,y)-L(x-1,y)^{2}+(L(x,y+1)-L(x,y-1))^ {2}}m(x,y)=(L(x+1,y)L(x1,y)2+(L(x,y+1)L(x,y1))2

θ ( x , y ) = a r c t a n ( L ( x , y + 1 ) − L ( x , y − 1 ) L ( x + 1 , y ) − L ( x − 1 , y ) ) θ(x,y)=arctan(\frac{L(x,y+1)-L(x,y-1)}{L(x+1,y)-L(x-1,y)}) θ ( x ,y)=a rc t an (L(x+1,y)L(x1,y)L(x,y+1)L(x,y1))

The calculation results of the neighborhood pixel gradient are shown in the figure below:
Insert image description here

After completing the key point gradient calculation, use the histogram to count the gradient amplitude and direction of the pixels in the key point neighborhood. The specific method is to convert 360° 360°360° is divided into36 3636 columns, every10° 10°10° is a column, and then in the area with r as the radius, find the pixels with the gradient direction in a certain column, and then add their amplitudes together as the height of the column. Because the contribution of the gradient amplitude of pixels to the central pixel in the area with radius r is different, the amplitude needs to be weighted, using Gaussian weighting with a variance of 1.5σ1.5σ1.5σ . _ As shown in the figure below, only 8 8is drawn to simplify the figure.Histogram in 8 directions.
Insert image description here
Each feature point must be assigned a main direction and one or more auxiliary directions. The purpose of adding auxiliary directions is to enhance the robustness of image matching. The definition of the auxiliary direction is that when the height of a cylinder is greater than 80% of the height of the main direction cylinder, the direction represented by the cylinder is the auxiliary direction for the feature point.

The peak of the histogram, that is, the direction represented by the highest column is the main direction of the image gradient in the neighborhood of the feature point, but the angle represented by the column is a range, so we also need to perform interpolation fitting on the discrete histogram, to get a more accurate direction angle value. Use a parabola to fit the discrete histogram, as shown in the figure below:

Insert image description here
After obtaining the main direction of the key points of the image, each key point has three pieces of information (x, y, σ, θ) (x, y, σ, θ)(x,y,s ,θ ) : position, scale, direction. From this we can determine aSIFT SIFTS I FT characteristic area. SIFTis usually represented by a circle with an arrow or an arrow directlyThere are three values ​​in the S I FT area: the center represents the location of the feature point, the radius represents the scale of the key point, and the arrow represents the direction. As shown below:
Insert image description here

1.1.5 Key point description

Through the above steps, each key point is assigned position, scale and orientation information. Next we build a descriptor for each keypoint that is both distinguishable and invariant to certain variables, such as lighting, viewing angle, etc. Moreover, the descriptor includes not only the key points, but also the pixels surrounding the key points that contribute to it. The main idea is to abstract the image information by dividing the image area around the key point into blocks, calculating the gradient histogram within the block, and generating a feature vector.

The descriptor is related to the scale at which the feature point is located, so we generate the corresponding descriptor on the Gaussian scale image where the key point is located. Taking the feature point as the center, divide its nearby neighborhood into d ∗ dd∗ddd sub-region (generallyd = 4 d=4d=4 ), each sub-region is a square with side length3σ 3σ3 σ , considering that in actual calculation, cubic linear interpolation is required, so the feature point neighborhood is3 σ ( d + 1 ) ∗ 3 σ ( d + 1 ) 3σ(d+1)∗3σ(d+1)3σ(d+1)3σ(d+1 ) , as shown in the figure below:
Insert image description here
In order to ensure the rotation invariance of the feature point, take the feature point as the center and rotate the coordinate axis to the main direction of the key point, as shown in the figure below: Calculate the gradient of the
Insert image description here
pixels in the sub-region, And according toσ = 0.5 d σ=0.5dp=Gaussian weighting is performed at 0.5 d , and then interpolation calculation is performed to obtain the gradient in eight directions of each seed point. The interpolation method is shown in the figure below:
Insert image description here

The gradient of each seed point is composed of 4 4 covering itIt is obtained by interpolation of 4 sub-regions. As the red point in the picture falls at0 0Line 0 and1 11 row, contributes to both rows. Right0 0Line 0 3 3The contribution factor of the 3 columns of seed points isdr drd r , for No.1 1The contribution factor of row 1 and column 3 is1 − dr 1-dr1d r , similarly, the contribution factor to the two adjacent columns isdc dcdc 1 − d c 1-dc 1d c , the contribution factor to the two adjacent directions isdo dodo 1 − d o 1-do 1d o . Then the final gradient accumulated in each direction is:

w e i g h t = w ∗ d r k ( 1 − d r ) 1 − k d c m ( 1 − d c ) 1 − m d o n ( 1 − d o ) 1 − n weight=w*dr^{k}(1-dr)^{1-k}dc^{m}(1-dc)^{1-m}do^{n}(1-do)^{1-n} weight=wdrk(1dr)1kdcm(1dc)1 m don(1do)1n

where k, m, nk, m, nk , m , n are0 00 or1 11 . The above statistics of 4*4*8=128 gradient information are the feature vectors of the key points. By sorting the feature vectors of each key point according to the feature points, the SIFT feature description vector is obtained.

1.1.6 Summary

S I F T SIFT SIFT has unparalleled advantages in extracting invariant features of images, but it is not perfect. It still has shortcomings such as low real-time performance, sometimes fewer feature points , and the inability to accurately extract feature points for targets with smooth edges. Since SIFTSIFTSince the advent of the SIFT algorithm, people have been optimizing and improving it, the most famous of which isSURF SURFS U RF algorithm.

1.2 SURF principle

Using SIFT SIFTThe execution speed of the SIFT algorithm for key point detection and description is relatively slow, and a faster algorithm is needed . In 2006 Bay proposedSURF SURFS U RF algorithm, isSIFT SIFTAn enhanced version of the SIFT algorithm. It has a small amount of calculation and fast operation speed. The extracted features are similar toSIFT. SIFTS I FT is almost the same, compare it withSIFT SIFTThe comparison of S I FT algorithm is as follows:

Insert image description here

1.3 Implementation

The process of using SIFT to detect key points in OpenCV is as follows:

1. Instantiate sift
sift = cv.xfeatures2d.SIFT_create()
2. Use sift.detectAndCompute() to detect key points and calculate
kp,des = sift.detectAndCompute(gray,None)

parameter:

  • gray: Image for key point detection, note that it is a grayscale image

return:

  • kp: Key point information, including position, scale, and direction information
  • des: Key point descriptor, each key point corresponds to 128 feature vectors of gradient information
3. Draw the key point detection results on the image
cv.drawKeypoints(image, keypoints, outputimage, color, flags)

parameter:

  • image: The original image
  • keypoints: Key point information, drawn on the image
  • outputimage: Output picture, which can be the original image
  • color: Color setting, (b,g,r)change the color of the brush by modifying the value, b=blue, g=green, r=red.
  • flags: Logo setting of drawing function
    1. cv2.DRAW_MATCHES_FLAGS_DEFAULT:Create an output image matrix, use the existing output image to draw matching pairs and feature points, and only draw the intermediate points for each key point
    2. cv2.DRAW_MATCHES_FLAGS_DRAW_OVER_OUTIMG: does not create an output image matrix, but instead draws matching pairs on the output image
    3. cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS: Draw a key point graphic with size and direction for each feature point
    4. cv2.DRAW_MATCHES_FLAGS_NOT_DRAW_SINGLE_POINTS: Single point feature points are not drawn.

S U R F SURF The application of the S U RF algorithm is consistent with the above process and will not be described in detail here.

Example:

Leveraging SIFT SIFTThe SIFT algorithm detects key points on CCTV pictures and plots them:

import cv2 as cv 
import numpy as np
import matplotlib.pyplot as plt
# 1 读取图像
img = cv.imread('./image/tv.jpg')
gray= cv.cvtColor(img,cv.COLOR_BGR2GRAY)
# 2 sift关键点检测
# 2.1 实例化sift对象
sift = cv.xfeatures2d.SIFT_create()

# 2.2 关键点检测:kp关键点信息包括方向,尺度,位置信息,des是关键点的描述符
kp,des=sift.detectAndCompute(gray,None)
# 2.3 在图像上绘制关键点的检测结果
cv.drawKeypoints(img,kp,img,flags=cv.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS)
# 3 图像显示
plt.figure(figsize=(8,6),dpi=100)
plt.imshow(img[:,:,::-1]),plt.title('sift检测')
plt.xticks([]), plt.yticks([])
plt.show()

result:

Insert image description here


Summarize

SIFT principle:

  • Scale space extreme value detection: Construct Gaussian pyramid, Gaussian difference pyramid, and detect extreme points.

  • Key point positioning: remove the influence of small contrast and edges on extreme points.

  • Key point direction determination: Use gradient histogram to determine the direction of key points.

  • Key point description: Divide the image area around the key point into blocks, calculate the gradient histogram within the block, generate a feature vector, and describe the key point information.

API:cv.xfeatures2d.SIFT_create()

SURF algorithm:

Improvements to the SIFT algorithm include improvements in scale space extreme value detection, key point direction determination, and key point description, improving efficiency.

Guess you like

Origin blog.csdn.net/m0_51366201/article/details/132650027