Classic Image Processing Algorithm--SIFT Scale Invariant Feature Transformation

1. Introduction to SIFT

SIFT (Scale-invariant feature transform scale-invariant feature transformation) is a traditional image extraction algorithm published by David G. Lowe in 1999. After modification and improvement, it was published on IJCV in 2004. The name of the paper is Distinctive Image Features From Scale-Invariant Keypoints, the original text is here.

The SIFT algorithm is an algorithm based on local interest points, which is not sensitive to image scale and image rotation, and the algorithm is also robust to the effects of illumination and noise.

The SIFT algorithm is mainly divided into four steps:

1. Scale-space extrema detection (scale-space extrema detection): By using the Gaussian difference function to search the image position on all scales, identify potential points of interest that are invariant to scale and direction.

2. Keypoint localization : At each candidate location, a well-fitting model is used to determine the location and scale, and the selection of keypoints depends on their stability.

3. Orientation assignment (orientation assignment for each key point): Based on the gradient direction of the local image, assign one or more directions to each key point position, and all subsequent operations on image data are based on relative key points The direction, scale and position of the object are transformed, thus obtaining the invariance of direction and scale.

4. Keypoint descriptor : Within each keypoint domain, local image gradients are computed at selected scales, and these gradients are transformed into a representation that allows deformation and illumination of relatively large local shapes Variety.

2. Scale space extremum detection

Gaussian pyramids are constructed using Gaussian kernels, and each scale layer contains images at a set of scales. Here, the feature pyramids of different groups are obtained by sampling at intervals.

 In each group (octave), make a pairwise difference to construct a Gaussian difference pyramid.

In the Gaussian difference pyramid, for each 3\times 3\times 3cube, compare the size of the center point with the other 26 points. If the center point is the largest or smallest, then the center point is the extreme point.

 

s represents the number of feature layers we want to obtain in the Gaussian difference pyramid, then a Gaussian difference pyramid with s+2 layers is required, and a Gaussian pyramid with s+3 layers is required. Therefore, s+3 layers of images are required in the octave of each group.

3. Key point positioning

The extreme points detected in the above steps are the extreme points of the discrete space. These extreme points are not very accurate. We need to accurately determine the position and scale of the key points, and remove low-contrast key points and unstable Edge points (because the DoG operator will produce a strong edge response), thereby enhancing stability.

The extreme points of the discrete space are not real extreme points. As shown in the figure below, the extreme points of the continuous space can be obtained by interpolation of the extreme points of the discrete space.

Therefore, here, the second-order Taylor expansion approximation is performed on the extreme points of the image

D(x)=D+\frac{\partial D^T}{\partial x}x+\frac{1}{2}x^T\frac{\partial^2 D}{\partial x^2}x

Find the derivative, let it be 0, and find the extreme point of the continuous space:

\hat{x}=-\frac{\partial^2 D^{-1}}{\partial x^2}\frac{\partial D}{\partial x}

Substitute the extreme point into the above Taylor approximation formula to get the value at the position of the extreme point:

D(\hat{x})=D+\frac{1}{2}\frac{\partial D^T}{\partial x}\hat{x}

Thus, the extremum point of the real continuous space can be obtained.

Then remove the extreme points with low contrast and edge effects, and finally get all the extreme points that need to be retained.

The removal of edge effects uses the eigenvalues ​​of the Hassian matrix, which is actually a second-order partial derivative. In the Gaussian difference image, if it is an edge feature point, its value changes in the direction perpendicular to the edge (corresponding to the larger eigenvalue in the Hassian matrix), and along the edge direction, the value changes small (corresponding to the Hassian matrix smaller eigenvalues). Therefore, the edge effect is removed by removing the feature points whose ratio of the larger eigenvalue to the smaller eigenvalue exceeds the threshold in the Hassian matrix.

H=\left [ \begin{matrix} D_{xx} &D_{xy} \\ D_{xy} & D_{yy} \end{matrix} \right ]

Tr(H)=D_{xx}+D_{yy}=\alpha +\beta

Det(H)=D_{xx}D_{yy}-(D_{xy})^2=\alpha \beta

\frac{Tr(H)^2}{Det(H)}< \frac{(r+1)^2}{r}

The test result is

Picture a is the original image, picture b is the effect picture obtained by direct key point detection, and picture c is the effect picture obtained by discarding the extreme points with small contrast. Figure d is the effect picture obtained after removing the extreme points of the edge effect. 

4. Direction matching

For each extreme point, the gradient directions and gradient magnitudes of all pixels in a circle with a radius of 1.5 times the scale of the Gaussian image where the feature point is located are counted. Get the histogram shown in the figure below (in the paper, every 10 degrees is a range, that is, there are 36 directions). The direction corresponding to the peak value of the histogram is the main direction, and any direction greater than 80% of the peak value is the auxiliary direction of the feature point.

5. Keypoint Descriptor

In the first three steps, we found the positions of all the feature points, and each feature point has direction and scale information. The next step is to calculate the descriptors of these feature points in the local area.

As shown in the figure below, the image gradient image is on the left and the key point descriptor is on the right. The descriptor is related to the scale image where the feature point is located. Therefore, on a Gaussian scale image, with the feature point as the center, divide its neighborhood into sub-regions, and 4\times 4each region counts the direction and scale of the feature point. Each sub-region Gradient information in 8 directions is obtained, so each feature point has a common 4\times 4\times 8=128dimensional feature.

6. Examples of SIFT applications

Guess you like

Origin blog.csdn.net/panpan_jiang1/article/details/126944165