Image processing---how is the Gaussian pyramid implemented in the image processing process?


First, the picture of the sift image pyramid

If this image sampling structure is to be said to be a pyramid, it is also a Mayan pyramid, not an Egyptian pyramid. Because this pyramid is clearly graded. Octave in English means an octave in music, and here it refers to a group of images. The resolution of this group of images is the same, but different Gaussian functions are used for filtering, so there is a difference in terms of the degree of blur (or in terms of the scale of interest.), and the images of different groups have The difference in scale is even greater with different resolutions.

So back to the main question. Why do we need to introduce Gaussian pyramids when we have used Gaussian functions of different scales for filtering? This is because the SIFT algorithm hopes to have a higher scale resolution (that is, it hopes that the changes of adjacent scales are finer), so many layers are required. If the Gaussian pyramid is not used, and multi-scale detection is realized by using different Gaussian functions at the original resolution, then the amount of computation for relatively coarse-scale feature extraction is quite wasteful. Because in the case of keeping the original resolution of the image unchanged, the extraction of coarse-scale features requires a large variance of the Gaussian function, and the corresponding filtering window is also relatively large, and the amount of calculation will increase sharply. Resolution is no longer necessary, and this computational consumption is even more of a loss. Therefore, the Gaussian pyramid is used to efficiently extract features of different scales.

The scale difference between different octaves is realized by the difference in resolution of the Gaussian pyramid, and the scale difference between different layers in the same octave is realized by the variance change of the Gaussian function. In addition, SIFT does not use the DOG function to directly filter the DOG problem, but is obtained by subtracting the Gaussian filtering results of two adjacent layers. Why is this?

Also to save computation. Because if the DOG function is used directly, in order to extract different scales, the window of the DOG function must be gradually enlarged, which will cause an increase in the amount of computation. In actual operation, SIFT first filters the sampled image of the resolution corresponding to the current octave with a Gaussian function with a relatively small window. The filtered result is then Gaussian filtered again. \sigma^2The result of two consecutive filtering of an original image with a Gaussian function with a variance is equivalent to directly filtering the image 2\sigma^2once with a Gaussian function. Therefore, filtering is performed on the basis of the filtering results of the previous layer every time, which is the same as the filtering results of Gaussian functions with different window sizes for the original image. However, because the expansion of the filtering function window is avoided, the calculation amount can be effectively reduced. .

Great explanation, share to share.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324883072&siteId=291194637