opencv learning thirteen: image pyramid and image gradient

1. Scale adjustment

    As the name implies, the size of the source image is enlarged or reduced. The resize function can be used in opencv to accurately convert the source image into a target image of a specified size. To reduce the image, it is generally recommended to use CV_INETR_AREA (area interpolation) to interpolate ; to enlarge the image, it is recommended to use CV_INTER_LINEAR (linear interpolation) . This function can be used to do simple image scaling. The following image pyramids are very useful, and they are all basic theories and techniques in feature detection;

API introduction in Opencv:

void resize(src,dst,size,int interpolation)
//src:源图像;dst:目标图像,
//size目标图像大小,可以是指定的尺寸或者放大缩小的比例
//指定插值方式,一般有四种插值方式可供选择,默认为线性插值法

2. Image Pyramid

    The image pyramid is a kind of multi-scale expression in the image. It is mainly used for image segmentation. It is an effective but simple concept structure to explain the image with multi-resolution. Image pyramids were originally used for machine vision and image compression. The pyramids of an image are a series of images arranged in a pyramid shape with gradually reduced resolution and derived from the same original image. It is obtained by down-sampling in steps, and sampling is not stopped until a certain termination condition is reached. The bottom of the pyramid is a high-resolution representation of the image to be processed, while the top is a low-resolution approximation. We compare the image layer by layer to a pyramid. The higher the level, the smaller the image and the lower the resolution.

Two common types of image pyramids

  • Gaussian pyramid: used for down/down sampling, the main image pyramid

  • Laplacian pyramid: It is used to reconstruct the upper unsampled image from the lower layer image of the pyramid. In digital image processing, it is the prediction residual. It can restore the image to the greatest extent and use it with the Gaussian pyramid.


Insert picture description hereThe brief difference between the two sample graph pyramids : Gaussian pyramid is used to downsample the image, note that downsampling is actually up-sampling from the bottom of the pyramid, and the resolution is reduced. It is the opposite of the pyramid concept we understand (note); and Laplacian The pyramid is used to reconstruct an image from the image at the bottom of the pyramid up-sampling.

     To generate the i+1th layer from the ith layer of the pyramid (we say that the i+1th layer is G_i+1), we first need to convolve G_1 with a Gaussian kernel, and then delete all even rows and even columns. The area of ​​the newly obtained image will become a quarter of the source image. The entire pyramid can be generated by performing operations on the input image G_0 according to the above process.

    As the image moves to the upper level of the pyramid, the size and resolution decrease. In OpenCV, PryDown can be used to generate the next level image from the upper level image in the pyramid. And through PryUp, the existing image is enlarged twice in each dimension.

    Up-sampling and down-sampling in the image pyramid are implemented by OpenCV functions pyrUp and pyrDown, respectively. In summary:

  • Upsampling the image: pyrUp function
  • Downsample the image: the pyrDown function

    The downward and upward sampling here are for the size of the image (the opposite of the direction of the pyramid). Upward means doubling the image size, and downward means halving the image size. And if we understand the direction of the pyramid shown in the figure above, the upward image of the pyramid is actually shrinking, which happens to be the other way around.

    But it should be noted that PryUp and PryDown are not reciprocal, that is, PryUp is not the inverse operation of downsampling. In this case, the image is first expanded to twice its original size in each dimension, and the new rows (even rows) are filled with 0. Then convolve the specified filter (actually a filter that is doubled in each dimension) to estimate the approximate value of the "missing" pixel. PryDown() is a function that will lose information. In order to restore the original higher resolution image, we need to obtain the information lost by the down-sampling operation. This data is related to the Laplacian pyramid.

2.1 Gaussian pyramid

    The Gaussian pyramid obtains a series of down-sampled images through Gaussian smoothing and sub-sampling. That is to say, the K-th Gaussian pyramid can obtain K+1-level Gaussian images through smoothing and sub-sampling. The Gaussian pyramid contains a series of low-pass filters. The cutoff frequency gradually increases by a factor of 2 from the upper layer to the next layer, so the Gaussian pyramid can span a large frequency range. The image of the pyramid is as follows:
Insert picture description here
In addition, each layer is numbered from bottom to top, and the level is G_i+1 (indicated as G_i+1 is smaller than the i-th layer G_i).

  • The down-sampling operation of the image is to reduce the image.
    In order to obtain a pyramid image with a level of G_i+1, the method steps are as follows:
    <1> Perform Gaussian kernel convolution on image G_i to perform Gaussian blur;
    <2> Remove all even rows and columns.

    The image obtained is the image of G_i+1. Obviously, the resulting image is only a quarter of the original image. By continuously iterating the above steps on the input image G_i (original image), the entire pyramid will be obtained. At the same time, we can also see that down-sampling will gradually lose the information of the image. The above is the down-sampling operation of the image, that is, the image is reduced.

  • Up-sampling of the image, that is, enlarge the image
  • The method steps are as follows:
    <1> Expand the image twice in each direction, and fill the new rows and columns with 0.
    <2> Use the same kernel (multiplied by 4) to convolve with the enlarged image. Get an approximate value of "new pixels"

    The obtained image is the enlarged image, but it will be blurred compared with the original image, because some information has been lost during the zooming process. If you want to reduce the loss of information during the entire process of zooming out and zooming in, these data Formed the Pyramid of Laplace.

Requirements: In the case of the Laplace pyramid, the image size must be 2 to the n power * 2 to the n power, otherwise an error will be reported
dst=cv2.pyrDown(src)
dst: sampling result
src: original image

import cv2 as cv
import numpy as np

#高斯金字塔
def pyramid_demo(image):
    level = 3#金字塔的层数
    temp = image.copy()#拷贝图像
    pyramid_images = []
    for i in range(level):
        dst = cv.pyrDown(temp)
        pyramid_images.append(dst)
        cv.imshow("pyramid_down_"+str(i), dst)
        temp = dst.copy()
    return pyramid_images


src = cv.imread("lena.jpg")
cv.namedWindow("input image", cv.WINDOW_AUTOSIZE)
cv.imshow("input image", src)
c=pyramid_demo(src)
cv.waitKey(0)
cv.destroyAllWindows()

The screenshot is as follows:
Insert picture description here    Gaussian filter can be regarded as a low-pass filter, then after each Gaussian filter, only the frequency part below a certain frequency value can be retained in the image, so the Gaussian pyramid can also be regarded as a low-pass pyramid (each Only the components below a certain frequency are retained in the first level).

2.2 Laplacian pyramid

    When performing Gaussian pyramid calculations, due to continuous Gaussian filtering and downsampling, we lost a lot of high-frequency signals. The purpose of the Laplacian pyramid is to save these high-frequency signals. The way to save these high-frequency signals is Save the difference image. For example, the 0th layer of the Laplacian Pyramid is the difference between the original image and the image that is downsampled (Reduce) and then upsampled (Expand).

    Another point to mention is that an important application of image pyramids is to achieve image segmentation. For image segmentation, an image pyramid must be established first, and then the pixels of G_i and G_i+1 directly follow the corresponding relationship to establish the "parent and child" relationship. The fast initial segmentation can be done on the low-resolution image of the high-level pyramid, and then the segmentation can be optimized layer by layer.

    Note: Upsampling and downsampling are non-linear processing, irreversible and lossy processing!
Result = original image-first down then up
Down: the size becomes smaller
Up: the size becomes larger

#图像金字塔
#高斯金字塔:reduce=2倍的高斯模糊+降采样(必须是一级一级来,不可越级);
#拉普拉斯金字塔:expand=扩大+卷积(升采样)
'''
图像金字塔方法的原理是:将参加融合的的每幅图像分解为多尺度的金字塔图像序列,将低分辨率的图像在上层,
高分辨率的图像在下层,上层图像的大小为前一层图像大小的1/4。层数为0,1,2……N。将所有图像的金字塔在
相应层上以一定的规则融合,就可得到合成金字塔,再将该合成金字塔按照金字塔生成的逆过程进行重构,得到融合金字塔。
'''
import cv2 as cv
import numpy as np
'''
拉普拉斯字塔:首先将原图像作为最底层图像G0(高斯金字塔的第0层),利用高斯核(5*5)对其进行卷积,
然后对卷积后的图像进行下采样(去除偶数行和列)得到上一层图像G1,将此图像作为输入,
重复卷积和下采样操作得到更上一层图像,反复迭代多次,形成一个金字塔形的图像数据结构
'''
def pyramid_demo(image):#高斯金字塔
    level = 3
    temp = image.copy()#复制一张
    pyramid_images = []
    for i in range(level):
        dst = cv.pyrDown(temp)#降     API,原理在上方
        pyramid_images.append(dst)#加入列表
        #cv.imshow("pyramid_down_"+str(i), dst)
        temp = dst.copy()#为temp赋新值
    return pyramid_images
def lapalian_demo(image)::#拉普拉斯金字塔
    pyramid_images = pyramid_demo(image)#首先得到高斯金字塔结果
    level = len(pyramid_images)
    for i in range(level-1, -1, -1):#从小图到大图
        if (i-1) < 0 :#最后一层特殊处理
            expand = cv.pyrUp(pyramid_images[i], dstsize=image.shape[:2])#内插放大后图像
            lpls = cv.subtract(image, expand)#它的每一层L0图像是高斯金字塔本层G0图像与其高一层图像G1经内插放大后图像*G1的差
            cv.imshow("lapalian_down_" + str(i), lpls)
        else:
            expand = cv.pyrUp(pyramid_images[i], dstsize=pyramid_images[i-1].shape[:2])
            lpls = cv.subtract(pyramid_images[i-1], expand)
            cv.imshow("lapalian_down_"+str(i), lpls)


#注意:图像的大小最好是2的n次方的倍数,否则结果会出问题,可以用resize调节大小
src = cv.imread("lena.png")#图必须要是2的N次倍数
#src=cv.resize(src,(512,512))
cv.namedWindow("input image", cv.WINDOW_AUTOSIZE)
cv.imshow("input image", src)
lapalian_demo(src)
cv.waitKey(0)
cv.destroyAllWindows()

Run screenshot:
Insert picture description hereNote that the image size must be a multiple of 2, otherwise an error will be reported

Three, image gradient

Sobel operator theory
As shown in the figure, the red dotted area in the figure is from hair to skin. Open first in the hair area, the hair is black, the pixel points, and then the skin, the pixels are relatively high, and the following figure 2 is obtained. The first derivative of figure 2 is obtained to obtain figure 3, and the highest derivative can be found at the edge.
The first derivative means to make difference.
Insert picture description here
Insert picture description hereOperator sum is 0

laplacian operator theory, the
Insert picture description hereInsert picture description here
sum of operators is 0

import cv2 as cv
import numpy as np

def sobel_demo(image):
    grad_x = cv.Sobel(image, cv.CV_32F, 1, 0) #x和y方向的结果
    grad_y = cv.Sobel(image, cv.CV_32F, 0, 1)
    #Scharr算子是Sobel算子的增强版,轮廓更加明显
    # grad_x = cv.Scharr(image, cv.CV_32F, 1, 0)  # x和y方向的结果
    # grad_y = cv.Scharr(image, cv.CV_32F, 0, 1)
    gradx = cv.convertScaleAbs(grad_x)#上一步的计算结果有正有负,把他们全部转化为绝对值
    grady = cv.convertScaleAbs(grad_y)#最后全部转换到6位的结果上去
    cv.imshow("gradient_x", grad_x)
    cv.imshow("gradient_y", grad_y)

    gradxy = cv.addWeighted(gradx, 0.5, grady, 0.5, 0)
    cv.imshow("gradient", gradxy)


src = cv.imread("C:/Users/lenovo/Desktop/opencv/daima/banknum/template-matching-ocr/images/lena.jpg")  #读取图片位置
cv.namedWindow("input image", cv.WINDOW_AUTOSIZE)
cv.imshow("input image", src)
sobel_demo(src)
cv.waitKey(0)
cv.destroyAllWindows()

Running screenshot: The
Insert picture description herescharr operator is an enhanced version of the sobel operator. Needless to say, when the edge obtained by Sobel is not very good, consider the scharr operator.

import cv2 as cv
import numpy as np

def sobel_demo(image):
   
    #Scharr算子是Sobel算子的增强版,轮廓更加明显
    grad_x = cv.Scharr(image, cv.CV_32F, 1, 0)  # x和y方向的结果
    grad_y = cv.Scharr(image, cv.CV_32F, 0, 1)
    gradx = cv.convertScaleAbs(grad_x)#上一步的计算结果有正有负,把他们全部转化为绝对值
    grady = cv.convertScaleAbs(grad_y)#最后全部转换到6位的结果上去
    cv.imshow("gradient_x", grad_x)
    cv.imshow("gradient_y", grad_y)

    gradxy = cv.addWeighted(gradx, 0.5, grady, 0.5, 0)
    cv.imshow("gradient", gradxy)


src = cv.imread("C:/Users/lenovo/Desktop/opencv/daima/banknum/template-matching-ocr/images/lena.jpg")  #读取图片位置
cv.namedWindow("input image", cv.WINDOW_AUTOSIZE)
cv.imshow("input image", src)
sobel_demo(src)
cv.waitKey(0)
cv.destroyAllWindows()

Run screenshot:
Insert picture description hereImplementation of laplacian operator

import cv2 as cv
import numpy as np

def laplacian_demo(image):
    # dst = cv.Laplacian(image, cv.CV_32F)
    # lpls = cv.convertScaleAbs(dst)

    #手动定义一个拉普拉斯算子
    #kernel = np.array([[0, 1, 0], [1, -4, 1], [0, 1, 0]])#laplacian默认用的4邻域算子,即中间的值是-4
    kernel = np.array([[1, 1, 1], [1, -8, 1], [1, 1, 1]])#用8邻域是增强版的

    dst = cv.filter2D(image, cv.CV_32F, kernel=kernel)
    lpls = cv.convertScaleAbs(dst)
    cv.imshow("laplacian_demo", lpls)



print("--------- Python OpenCV Tutorial ---------")
src = cv.imread("daqiu.jpg")
cv.namedWindow("input image", cv.WINDOW_AUTOSIZE)
cv.imshow("input image", src)
lapalian_demo(src)
cv.waitKey(0)
cv.destroyAllWindows()

Run screenshot:
When using the Laplacian function:
Insert picture description hereUse the default 4-neighbor operator of
Insert picture description herelaplacian Use the 8-neighbor operator enhanced by laplacian
Insert picture description hereReference blog: Use of various pyramid image gradient operators in
image pyramid
images

Guess you like

Origin blog.csdn.net/weixin_44145452/article/details/112761159