Preface
The second part is about the construction of descriptors. It seems that this part and the previous part can be put together...
text
theory
First of all, what is a descriptor? A descriptor is a set of numerical features extracted from images, audio, text or other data, used to describe the key information of the data. In image processing, feature point detection algorithms (such as SIFT, SURF, ORB, etc.) are often used to detect key points in the image, and then calculate the descriptors of these key points. These descriptors can be used for tasks such as image matching, target recognition, and image retrieval. In natural language processing, word embeddings can be regarded as a descriptor for describing text data, used to map words into high-dimensional vector space.
In short, we have only talked about two types of feature points. Each feature point has a corresponding descriptor, so we will divide it into two descriptor algorithms:
corner descriptor
SIFT descriptor
The following is a detailed introduction to the process of constructing feature point descriptors using this algorithm:
1. Feature point detection and feature scale determination:
First, detect the feature points in the image through methods such as Harris corner point detection. At the same time, the feature scale of each feature point is determined. This scale can usually be obtained through a detector algorithm, such as the Harris corner detector.
2. Determination of the main direction of the neighborhood:
Within the neighborhood around each feature point, calculate the gradient direction of all pixels. Then, a histogram of gradient directions ranging from 0 to 2π is constructed, and the angle with the maximum frequency is selected as the main direction of the neighborhood. This step ensures the rotation invariance of the feature point descriptor.
3. Construction of feature point descriptor:
Divide the neighborhood of the feature point into a 4x4 grid, resulting in a total of 16 small areas. Within each small area, the gradient direction of the pixel is calculated and a gradient direction histogram of length 8 is constructed. Therefore, the descriptor of each small area is a vector containing 8 gradient directions. These vectors are concatenated to form a feature point descriptor of length 128. (The teacher asked here)
4. Normalization of descriptors:
Finally, in order to solve the problem of illumination invariance, the constructed feature point descriptors are normalized. Normalization can ensure that the descriptor remains consistent under different lighting conditions and increases the accuracy of matching.
Through the above steps, we obtained feature point descriptors with scale invariance, rotation invariance and illumination invariance. These descriptors are usually used in computer vision tasks such as image matching and target recognition.
Conclusion
Anyway, this part ends here