First, the picture of the sift image pyramid
So back to the main question. Why do we need to introduce Gaussian pyramids when we have used Gaussian functions of different scales for filtering? This is because the SIFT algorithm hopes to have a higher scale resolution (that is, it hopes that the changes of adjacent scales are finer), so many layers are required. If the Gaussian pyramid is not used, and multi-scale detection is realized by using different Gaussian functions at the original resolution, then the amount of computation for relatively coarse-scale feature extraction is quite wasteful. Because in the case of keeping the original resolution of the image unchanged, the extraction of coarse-scale features requires a large variance of the Gaussian function, and the corresponding filtering window is also relatively large, and the amount of calculation will increase sharply. Resolution is no longer necessary, and this computational consumption is even more of a loss. Therefore, the Gaussian pyramid is used to efficiently extract features of different scales.
The scale difference between different octaves is realized by the difference in resolution of the Gaussian pyramid, and the scale difference between different layers in the same octave is realized by the variance change of the Gaussian function. In addition, SIFT does not use the DOG function to directly filter the DOG problem, but is obtained by subtracting the Gaussian filtering results of two adjacent layers. Why is this?
Also to save computation. Because if the DOG function is used directly, in order to extract different scales, the window of the DOG function must be gradually enlarged, which will cause an increase in the amount of computation. In actual operation, SIFT first filters the sampled image of the resolution corresponding to the current octave with a Gaussian function with a relatively small window. The filtered result is then Gaussian filtered again. The result of two consecutive filtering of an original image with a Gaussian function with a variance is equivalent to directly filtering the image once with a Gaussian function. Therefore, filtering is performed on the basis of the filtering results of the previous layer every time, which is the same as the filtering results of Gaussian functions with different window sizes for the original image. However, because the expansion of the filtering function window is avoided, the calculation amount can be effectively reduced. .