Computer Vision: Image Features and Description

1. Color features: quantified color histogram, clustered color histogram

(1) Quantized color histogram

Applicable color space: RGB, HSV and other color spaces
Operation: color space quantization, unit (bin) is represented by the center of
the unit Count the number of pixels falling on the quantization unit
Advantages: efficient calculation
Disadvantages: quantization problems, sparse

(2) Cluster color histogram

Applicable color space: Lab and other color spaces
Operation: Use the clustering algorithm to cluster all pixel color vectors.
The unit (bin) is represented by the cluster center
. Problem: When two color histograms with similar colors are only one bin wrong, Using L1 distance or Euler distance to calculate the similarity between the two will get a small similarity value.
Solution: Consider the similarity between similar to non-identical colors. a: The quadratic distance is used. b: Smoothly filter the color histogram in advance, that is, the pixels in each bin also contribute to several adjacent bins.

2. Geometric features: Edge, Corner, Blob

(1) Edge

Definition: The area where the pixel value function changes rapidly – ​​the area where the first derivative is extreme.
Edge extraction: Gaussian denoising first, and then using the first derivative to obtain extreme values.
(Derivatives are sensitive to noise!)
2D: Gradient magnitude: hx(x,y)^2+ hy(x,y)^2
The direction in which the gradient increases fastest arctan(hy(x,y)/hx(x, y))

(2) Key points

When observing the same object from different distances, different directions, angles, and different lighting conditions, the size, shape, and brightness of the object will be different. But we can still judge it to be the same object.
An ideal feature descriptor should possess these properties. That is, in images with different sizes, directions, and brightness, the same feature point should have similar descriptors, which is called the reproducibility of descriptors.
Feature points: mapping between images from different perspectives
Features: repeatability, saliency
Anti-image transformation: appearance transformation (brightness, illumination), geometric transformation (translation, selection, brightness)
Uses: image registration/stitching, motion tracking, Object Recognition, Robot Navigation

(3) Corner (Corner)

Harris corner:
A salient point that moves a small viewing window in any direction, resulting in large pixel changes.
Corner area: Movement in any direction will have large pixel changes.
Fast Corner:
Fast Corner Detection is a fast corner feature detection algorithm.
The definition of fast corner points: If a pixel point is in a different area from enough pixels in the surrounding area, the pixel point may be a corner point.
Method: Determine a threshold t, observe a discretized circle with a radius equal to 3 pixels centered on the pixel point, and there are 16 pixels on the boundary of this circle. If there are n consecutive pixels on the circle of size 16 pixels, and their pixels are all larger or smaller than the pixels of the middle point, then p is a corner point. (Calculation is very fast)

Disadvantages: No multi-scale information, no direction.

(4) Spots

Laplace gradient:
first derivative extreme point -> second derivative zero point. The point where the second derivative is 0 is the extreme point. –>Boundary
extremum of second derivative –>Boundary
Gaussian transform first, then Laplace transform. Blobs can have different scales.
(For Gaussian denoising, when the Gaussian kernel is small, it will be more sensitive and the amount of noise will be large. On the contrary, if the Gaussian kernel is too large, important information may be lost.)

3. Local features: SIFT, SURF, ORB

(1)SIFT

Based on the invariant characteristics of scale space
Features: It has good invariance. For rotation, scaling, translation, etc., the affine transformation has a certain stability to the change of viewing angle.
Good uniqueness and rich information Abundance
, even a small number of objects, can generate a large number of SIFT features
SIFT feature calculation steps:
Obtain extreme points in the DOG scale space, that is,
key points ), remove edge points
, direction estimation of
key points, generation of descriptors for key points: region coordinate rotation -> calculate histogram of sampling
area
The amount is large, so DOG is used. (using Gaussian difference instead of Gauss-Laplace)
Gaussian pyramid: On the basis of traditional pyramid, Gaussian blur is used for each layer with different parameters, so that each layer of pyramid has multiple Gaussian blurred images, such a set of images
The first picture of an octave octave (i+1) is obtained by down-sampling the third-to-last picture of octave (I).
Each layer of the Gaussian pyramid is subtracted to get the DOG
judgment extreme point: take X as the detection point , the surrounding points, in addition to the 8 points surrounded by the same layer, the 9 points of the upper and lower layers should also meet the conditions.
The extreme points in SIFT-DOG space are "key points". The radius of the circle - "feature point scale, the center of the circle - "feature point coordinates.
SIFT feature direction estimation: Calculate the gradient histogram on the scale. The direction that
obtains the highest value is the main direction of the key point.
For the stability of matching, the direction that exceeds 80% of the highest value is the secondary direction.
Generation of key descriptors:
Regional coordinate rotation: In order to ensure that the feature vector has rotation invariance, it is necessary to rotate the image in the field near the feature point by a direction angle with the feature point as the center. That is to rotate the original image to the same orientation as the main orientation. A pixel window of
16 by 16 is sampled on the rotated coordinates . Each 4 4 grid has an 8-direction histogram for a total of 128 dimensions. That is, a histogram of the sampling interval is generated.
Disadvantage: The calculation is too complicated.

(2)SURF

Haar-like features:
divided into edge features, linear features, central features and diagonal features, these features are combined to form a feature template.
The feature template has two kinds of rectangles, white and black, and defines the feature value of the template as the white rectangle minus the black rectangle pixel sum.
The Haar feature reflects the grayscale change of the image.
SURF is an improvement over SIFT that can triple the speed of operations. SURF simplifies some operations of SIFT.
(a) Find feature points:
SURF simplifies the template of Gaussian second-order differential in SIFT, so that the convolution smoothing operation only requires addition and subtraction operations.
In order to find the feature points of the image, the original image needs to be transformed. The transformed image is composed of the approximate value of the Hessian matrix determinant of each pixel in the original image. For the Hessian matrix, Gaussian smoothing is performed first, and then the second derivative is cut. For discrete pixels, these two operations can be replaced by a haar template.
(b) Find the positive direction:
In order to ensure that the rotation is not deformed, in SURF, the Haar wavelet characteristics in the field of feature points are counted.
That is, taking the feature point as the center, calculating the sum of the haar wavelet responses in the x (horizontal) and y (vertical) directions of all points in the 60-degree sector in the neighborhood with a radius of 6s (s is the scale value of the feature point). When the direction is determined, the haar wavelet response in the X and Y directions is calculated in the circular area, and the fan-shaped area with the largest modulus is found.
(c) Generation of keypoint descriptors:
take a square frame around the feature point with a side length of 20s (s is the scale of the detected feature point). The orientation of the box is the detected main image. The box is divided into 16 sub-regions, and each sub-region counts the haar wavelet characteristics of 25 pixels in the horizontal and vertical directions. Calculate and sum the absolute value of dx, dx and the absolute value of dy, dy, and the dimension of the obtained feature point vector is 64 dimensions.
Compared with SIFT:
three times faster, better effect under brightness change, better in blur
It is not as good as SURF in terms of scale invariance, and much worse in terms of rotation invariance

(3) ORB feature description (faster)

ORB feature is based on FAST corner feature point detection and BRIEF feature description technology.
It is a combination and improvement of FAST corner and BRIEF feature descriptor.
Disadvantage of Fast Corner Detection: Lack of Scale Invariance – Scale invariance can be achieved by building a Gaussian pyramid and detecting corners on each layer of the Gaussian pyramid image.
Disadvantages of BRIEF: lack of rotation invariance, need to add rotation invariance to Brief.
BRIEF: You
need to smooth the image first, and then select a Patch around the feature points. In this Patch, a selected method is used to select n point pairs to compare the size of each two-point pixel in the point pair, and all n point pairs Compared with each other, a binary string of length n is generated.
ORB's improvement on Brief:
The coordinates established by ORB when calculating the Brief descriptor take the key point as the center of the circle, and use the line connecting the key point and the centroid (center of the circle) of the point area as the X-axis to establish a coordinate system. When calculating the centroid, the "quality" (gray value) of each point on the circular area is its corresponding pixel value.

4. Other feature descriptors (LBP, Gabor)

(1) LBP (Local Binary Pattern):

Compare the size of each pixel to surrounding points. On a circle of radius R, P points are uniformly sampled, and the size is quantized to 0 and 1.
To solve the rotation invariance, the binary code around the LBP is rotated bit by bit, and the smallest binary code is the final LBP value.
Improved LBP: Extend 3*3 neighborhoods to arbitrary neighborhoods, and replace square neighborhoods with circular neighborhoods.
LBP features have significant advantages such as grayscale invariance and rotation invariance.

(2)GABOR:

Linear filters for edge extraction, whose frequency and orientation representation are similar to those of the human visual system, can provide good orientation and scale selection properties, and are insensitive to illumination.
Great for texture analysis.
A Gabor filter is obtained by superimposing a trigonometric function and a Gaussian function.
Frequency Domain: Windowed Fourier Transform.
Airspace: The product of a Gaussian kernel function and a sine plane wave.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324456193&siteId=291194637