Chapter 11: Image Pyramids

What is an image pyramid?

An image pyramid is an image collection composed of multiple subimages of different resolutions of an image. It is generated by continuously downsampling an image, and the smallest image may only have one pixel. The image below is an example of an image pyramid. As can be seen from the figure, the image pyramid is a series of images arranged in a pyramid shape and whose resolution gradually decreases from bottom to top.

image-20211103135027005

Typically, the bottom of the image pyramid is the high-resolution image to be processed (the original image), while the top is its low-resolution approximation. As you move towards the top of the pyramid, the size and resolution of the image decreases continuously. Typically, the width and height of the image are reduced to half of the original for each step up.

1. Theoretical basis:

​The image pyramid is a collection of subimages of different resolutions of the same image, which is generated by continuously downsampling the original image, that is, a low-resolution approximate image (small size) is generated from a high-resolution image (large size) .

1. Downsample:

The simplest image pyramid can be obtained by continuously deleting even rows and columns of the image. For example, there is an image whose size is N*N, and an image of size (N/2)*(N/2) is obtained after deleting its even-numbered rows and even-numbered columns. After the above processing, the size of the image becomes 1/4 of the original size, and the process is repeated continuously to obtain the image pyramid of the image.

It is also possible to obtain an approximate image of the original image by filtering the original image first, and then delete the even-numbered rows and even-numbered columns of the approximate image to obtain the downsampling result. There are various filters to choose from.

  • Field filter: Calculate the approximate image of the original image by using the neighborhood average. This filter is capable of producing average pyramids.

  • Gaussian filter: Gaussian filter is used to filter the original image to obtain Gaussian pyramid. This is the approach taken by the OpenCV function cv2.pyrDown().

    The Gaussian pyramid is generated by continuously using the Gaussian pyramid for filtering and sampling. The process is as follows:

    image-20211103141432789

After the above processing, the original image and the resulting image obtained by each downsampling together form a Gaussian pyramid. For example, the original image may be called layer 0, the image resulting from the first downsampling is called layer 1, the image resulting from downsampling the second time is called layer 2, and so on. The Gaussian pyramid formed by the above image is shown in the figure.

image-20211103141930222

​ In the following, in order to express unity, we will always refer to the bottom layer of the image pyramid as layer 0, the layer above the bottom layer as layer 1, and so on.

2. Upsample:

In the process of upsampling, the width and height of the image are usually doubled. This means, the upsampled resulting image is 4 times the size of the original image. Therefore, a large number of pixels should be added to the resulting image. The behavior of assigning values ​​to newly generated pixels is called插值 . This process can be realized in many ways. For example, the nearest neighbor interpolation is to use the nearest pixel to assign a value to a pixel that does not currently have a value.

​ There is also a common upsampling, which completes the interpolation of pixels by padding with zeros. Usually, a column with a value of zero is inserted to the right of each column of pixels, and a row with a value of zero is inserted below each row of pixels. As shown in the figure, the left side is the 4 pixels to be up-sampled, and the right side is the post-processing result of zero padding during up-sampling.

image-20211103142850912

Next, use the Gaussian filter used for downsampling to filter the zero-padded image to obtain the upsampled result image. But it should be noted that at this time, three quarters of the pixels in the image have zero values. Therefore, the Gaussian filter coefficient must be multiplied by 4 to ensure that the obtained pixel value is within the range of its original pixel value.

​ For example, for the pixel on the right side of the image above, it corresponds to an 8-bit image, and the range of pixel values ​​is [0, 255]. Since the values ​​of three quarters of the pixels are zero, if the Gaussian filter is directly used for convolution calculation, the range of pixel values ​​will become [0, 255*1/4]. Therefore, the Gaussian filter coefficients used should be multiplied by 4 to ensure that the range of obtained pixel values ​​is still within [0, 255].

​ Or, from another perspective, insert a zero-value column in the right column of each pixel in the original image, insert a zero-value row in the next row of each pixel, and make the image twice as wide and two double high. Next, the zero-padded image is convolved with a Gaussian filter. Finally, the value of each pixel in the image is multiplied by 4 to ensure that the range of pixel values ​​is consistent with the original image.

​ From the above analysis, we can see that upsampling and downsampling are two opposite operations. However, the two operations are not reversible due to the loss of pixel values ​​by upsampling and downsampling. That is to say, it is impossible to restore the original state of an image by upsampling and then downsampling; similarly, it is impossible to restore the original state to an image by downsampling and then upsampling.

2. The pyrDown function is used:

Use the function cv2.pyrDown() in OpenCV to implement downsampling in the image Gaussian pyramid operation. The syntax is:

dst = cv2.pyrDown(src [, dstsize [, borderType] ])

  • dst: destination image
  • src: original image
  • dstsize: the size of the target image
  • borderType: border type, the default value is BORDER_DEFAULT, and only BORDER_DEFAULT is supported here.

By default, the size of the output image is Size((src.cols+1)/2, (src.rows+1)/2) . In any case, the dimensions of the image must meet the following conditions:

  • |dst.width * 2 - src.cols| ≤ 2

    |dst.height * 2 - src.rows| ≤ 2

The cv2.pyrDown() function first performs Gaussian filter transformation on the original image to obtain an approximate image of the original image. After obtaining an approximate image, the function implements downsampling by discarding even rows and columns.

Example:

import cv2

img = cv2.imread('../lena.bmp')
rst1 = cv2.pyrDown(img)
rst2 = cv2.pyrDown(rst1)
rst3 = cv2.pyrDown(rst2)

print('img.shape=', img.shape)
print('rst1.shape=', rst1.shape)
print('rst2.shape=', rst2.shape)
print('rst3.shape=', rst3.shape)

cv2.imshow('img', img)
cv2.imshow('rst1', rst1)
cv2.imshow('rst2', rst2)
cv2.imshow('rst3', rst3)

cv2.waitKey()
cv2.destroyAllWindows()

# 输出结果
img.shape= (512, 512, 3)
rst1.shape= (256, 256, 3)
rst2.shape= (128, 128, 3)
rst3.shape= (64, 64, 3)

image-20211103153007877

Three, pyrUp function and use:

In OpenCV, use the function cv2.pyrUp() to realize upsampling in the image pyramid operation, and its syntax format is:

dst = cv2.pyrUp(src, [, dstsize [, borderType ] ] )

  • dst: destination image
  • src: original image
  • dstsize: the size of the target image
  • borderType: border type, the default value is BORDER_DEFAULT, and only BORDER_DEFAULT is supported here.

By default, the size of the output image is Size(src.cols*2, src.rows*2). In any case, the dimensions of the image must meet the following conditions:

  • |dst.width - src.cols * 2| ≤ mod(dst.widh, 2)

    |dst.height - src.rows * 2| ≤ mod(dst.height, 2)

​ When using the cv2.pyrUp() function to upsample the image, insert zero-value columns and zero-value rows to the right and bottom of each pixel to obtain an even-numbered row and even-numbered column (that is, a newly added row and column) Both are new images New with zero values. Next, filter the new image New with the Gaussian filter used for downsampling to obtain the result image of upsampling. It should be noted that in order to ensure that the range of pixel values ​​remains consistent with the original image after upsampling, the coefficient of the Gaussian filter needs to be multiplied by 4.

Example:

import cv2

img = cv2.imread('../boat.512.tiff')
rst1 = cv2.pyrUp(img)
rst2 = cv2.pyrUp(rst1)
rst3 = cv2.pyrUp(rst2)

print('img.shape=', img.shape)
print('rst1.shape=', rst1.shape)
print('rst2.shape=', rst2.shape)
print('rst3.shape=', rst3.shape)

cv2.imshow('img', img)
cv2.imshow('rst1', rst1)
cv2.imshow('rst2', rst2)
cv2.imshow('rst3', rst3)

cv2.waitKey()
cv2.destroyAllWindows()

# 输出结果
img.shape= (64, 64, 3)
rst1.shape= (128, 128, 3)
rst2.shape= (256, 256, 3)
rst3.shape= (512, 512, 3)

image-20211103162317850

4. Research on Sampling Reversibility

After the image is up-sampled, the overall size becomes 4 times of the original; after down-sampling, the overall size becomes a quarter of the original. The figure below shows the size change relationship of the image before and after sampling. An image of M*N size will become (M/2)*(N/2) after down-sampling; an image of M*N size will become (2M)*( after up-sampling) 2N)

​ Although an image will return to its original size after downsampling and upsampling, but upsampling and downsampling are not reciprocal. That is to say, although after two sampling operations, the resulting image has the same size as the original image and looks similar to the naked eye, but the pixels of the two are not the same.

Example:

import cv2

img = cv2.imread('../boat.512.tiff')

down = cv2.pyrDown(img)
up = cv2.pyrUp(down)
diff = up - img
print('img.shape=', img.shape)
print('down.shape=', up.shape)
cv2.imshow('img', img)
cv2.imshow('up', up)
cv2.imshow('diff', diff)
cv2.waitKey()
cv2.destroyAllWindows()

# 输出结果
img.shape= (512, 512, 3)
down.shape= (512, 512, 3)

image-20211103164347143

5. Laplace Pyramid

​ The Gaussian pyramid introduced earlier is generated by a series of downsampling of an image. Sometimes, we want to obtain a complete large-scale high-resolution image by upsampling the small image in the pyramid, then we need to use the Laplacian pyramid.

1. Definition:

After an image has been down-sampled, it cannot be restored to its original state when it is up-sampled. In this regard, we also use the program to verify. Upsampling is not the inverse of downsampling. This is obvious, because the Gaussian filter is used to discard even rows and even columns during downsampling, and some information will inevitably be lost.

​In order to restore the original image with a higher resolution during upsampling, it is necessary to obtain the information lost in the sampling process, which constitutes the Laplacian pyramid. Also the Laplacian pyramid is constructed with information lost when downsampling.

The definition form of the Laplacian pyramid is:

  • Li = Gi - pyrUp( Gi + 1 )

In the formula:

  • Li: Indicates the i-th layer in the Laplacian pyramid
  • Gi: Indicates the i-th layer in the Gaussian pyramid

​ The i-th layer in the Laplacian pyramid is equal to the difference between "the i-th layer in the Gaussian pyramid" and "the upsampling result of the i + 1-th layer in the Gaussian pyramid". The figure below shows the correspondence between the Gaussian pyramid and the Laplacian pyramid.

image-20211104141957761

Example: Constructing a Laplacian pyramid using cv2.pyrDown() and cv2.pyrUp()

import cv2

img = cv2.imread('../boat.512.tiff')
G1 = cv2.pyrDown(img)
G2 = cv2.pyrDown(G1)
G3 = cv2.pyrDown(G2)

L0 = img - cv2.pyrUp(G1)
L1 = G1 - cv2.pyrUp(G2)
L2 = G2 - cv2.pyrUp(G3)

print('L0.shape=', L0.shape)
print('L1.shape=', L1.shape)
print('L2.shape=', L2.shape)

cv2.imshow('L0', L0)
cv2.imshow('L1', L1)
cv2.imshow('L2', L2)
cv2.waitKey()
cv2.destroyAllWindows()

# 输出结果
L0.shape= (512, 512, 3)
L1.shape= (256, 256, 3)
L2.shape= (128, 128, 3)

image-20211104142942584

2. Application:

The role of the Laplacian pyramid is to restore high-resolution images. The figure below demonstrates how to restore a high-resolution image through a Laplacian pyramid.

image-20211104143448105

The meanings of the marks in the figure are as follows:

  • G0, G1, G2, and G3 are the 0th, 1st, 2nd, and 3rd layers of the Gaussian pyramid, respectively.
  • L0, L1, and L2 are the 0th, 1st, and 2nd layers of the Laplace pyramid, respectively.
  • The downward arrow indicates the downsampling operation (corresponding to the cv2.pyrDown() function)
  • The right arrow indicates the upsampling operation (corresponding to the cv2.pyrUp() function)
  • "+" means addition operation
  • "-" means subtraction operation

The operational relationships in the above figure are:

Downsample:

  • G1 = cv2.pyrDown(G0)
  • G2 = cv2.pyrDown(G1)
  • G3 = cv2.pyrDown(G2)

Laplace Pyramid:

  • L0 = G0 - cv2.pyrUp(G1)
  • L1 = G1 - cv2.pyrUp(G2)
  • L2 = G2 - cv2.pyrUp(G3)

Upsample to restore a high-resolution image:

  • G0 = L0 + cv2.pyrUp(G1)
  • G1 = L1 + cv2.pyrUp(G2)
  • G2 = L2 + cv2.pyrUp(G3)

The above relationship is derived by mathematical operation. For example, it is known that L0=G0-cv2.pyrUp(G1), and the cv2.pyrUp(G1) on the right side of the expression is moved to the left side, and the expression G0 = L0 + cv2.pyrUp(G1) is obtained. In addition, both G1 and G2 can be obtained through the construction expressions of the Laplacian pyramid. As mentioned before, the purpose of the Laplacian pyramid is to restore high-resolution images.

Example: Restoring High Resolution Images Using Laplacian Pyramids

import cv2
import numpy as np

img = cv2.imread('../boat.512.tiff')

G0 = img
G1 = cv2.pyrDown(G0)
G2 = cv2.pyrDown(G1)
G3 = cv2.pyrDown(G2)

L0 = G0 - cv2.pyrUp(G1)
L1 = G1 - cv2.pyrUp(G2)
L2 = G2 - cv2.pyrUp(G3)

rst_G0 = L0 + cv2.pyrUp(G1)
rst_G1 = L1 + cv2.pyrUp(G2)
rst_G2 = L2 + cv2.pyrUp(G3)

print('rst_G0', np.sum(abs(G0 - rst_G0)))
print('rst_G1', np.sum(abs(G1 - rst_G1)))
print('rst_G2', np.sum(abs(G2 - rst_G2)))

cv2.imshow('G0', G0)
cv2.imshow('G1', G1)
cv2.imshow('G2', G2)

cv2.imshow('rst_G0', rst_G0)
cv2.imshow('rst_G1', rst_G1)
cv2.imshow('rst_G2', rst_G2)

cv2.waitKey()
cv2.destroyAllWindows()

image-20211104145537252

Guess you like

Origin blog.csdn.net/weixin_57440207/article/details/122647012