Image geometric transformation summarized by OpenCV

1. Image scaling

To zoom in or zoom out the image, use the API:

cv2.resize(src, dsize, fx=0, fy=0, interpolation=cv2.INTER_LINEAR)

parameter:

src: input image

dsize: absolute size, directly specify the size of the image after adjustment

fx, fy: Relative size, set dsize to None, then set fx and fy as scale factors

interpolation: interpolation method:

interpolation	meaning
cv2.INTER_LINEAR	bilinear interpolation
cv2.INTER_NEAREST	nearest neighbor interpolation
cv2.INTER_AREA	Pixel area reuse
cv2.INTER_CUBIC	bicubic interpolation

Example:

import cv2 as cv

#读取图片
img1 = cv.imread("E:\\loaddown\\opencv_test\\1.jpg")

#图像缩放
#绝对尺寸
rows,cols = img1.shape[:2]
res = cv.resize(img1, (2*cols, 2*rows), interpolation=cv.INTER_CUBIC)

#相对尺寸
res1 = cv.resize(img1, None, fx=0.5, fy=0.5)

#图像显示
cv.imshow("orignal", img1)
cv.imshow("enlarge", res)
cv.imshow("shrink", res1)
cv.waitKey(0)

operation result:

2. Image panning

After specifying the translation matrix, call cv.warpAffine() to translate the image

cv.warpAffine(img, M, dsize)

parameter:

img: input image

M: 2*3 moving matrix

To $(x, y)$ move the pixels here and $(x+t_{x}, y+t_{y})$ there , the M matrix should be set to:

$M=\begin{Bmatrix} 1 & 0 & t_{x} \\ 0 & 1 & t_{y} \end{Bmatrix}$

Note: set M to a Numpy array of type np.float32

dsize: the size of the output image

Note: The size of the output image, it should be in the form of (width, height), remember, width=number of columns, height=number of rows.

Example:

import numpy as np
import cv2 as cv
import matplotlib.pyplot as plt

#读取图像
img1 = cv.imread("E:\\loaddown\\opencv_test\\1.jpg")

#图像平移，所有像素点移动（50，100）
rows,cols = img1.shape[:2]
M = np.float32([[1,0,50],[0,1,100]]) #平移矩阵
dst = cv.warpAffine(img1, M, [cols+50, rows+100])

#图像展示
fig,axes = plt.subplots(nrows=1, ncols=2, figsize=(10,8), dpi=100)
axes[0].imshow(img1[:,:,::-1])
axes[0].set_title("orignal")
axes[1].imshow(dst[:,:,::-1])
axes[1].set_title("translation")
plt.show()

Output result:

3. Image rotation

In opencv, the image rotation first obtains the rotation matrix according to the rotation center and rotation angle, and then transforms according to the rotation matrix, that is, calls cv.getRotationMatrix2D() to obtain the rotation matrix, and then calls cv.warpAffine() to rotate.

cv2.getRotationMatrix2D(center, angle, scale)

parameter:

center: center of rotation

angle: rotation angle

scale: zoom ratio

Example:

import cv2 as cv
import matplotlib.pyplot as plt
import numpy as np

#读取图像
img1 = cv.imread("E:\\loaddown\\opencv_test\\1.jpg")

#获取图像size
rows, cols = img1.shape[:2]
#获取旋转矩阵
M = cv.getRotationMatrix2D((rows/2, cols/2), 30, 1)

#计算变换后图像的显示size
cos = np.abs(M[0, 0])
sin = np.abs(M[0, 1])
cols1 = rows * sin + cols * cos
rows1 = rows * cos + cols * sin
cols1 = int(np.round(cols1))
rows1 = int(np.round(rows1))
M[0,2] += (cols1 - cols) * 0.5
M[1,2] += (rows1 - rows) * 0.5
#图像旋转
dst = cv.warpAffine(img1, M, [cols1,rows1])
# print("M: ", M)

#图像显示
fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(10,8), dpi=100)
axes[0].imshow(img1[:,:,::-1])
axes[0].set_title("orignal")
axes[1].imshow(dst[:,:,::-1])
axes[1].set_title("rotation")
plt.show()

Output result:

4. Affine transformation

Affine transformation is a combination of operations such as scaling, rotating, flipping and translating an image. As shown in the figure, the points 1, 2, and 3 in Figure 1-1 are mapped to the three points in Figure 2 one by one, and a triangle is still formed, but the size and relative distance have changed. Through such two groups of points of interest (at least three non-collinear points), the radial transformation matrix can be obtained, and then the transformation matrix is applied to all points in the image to complete the affine transformation of the image.

Calling cv.getAffineTransform() will create the transformation matrix, and finally this matrix will be passed to cv.warpAffine() for transformation

cv2.getAffineTransform(src, dst)

parameter:

src: coordinates of three points of interest in the original image

dst: the coordinates of the corresponding three points of interest in the target image

It should be noted that all parallel lines in the original image are still parallel after affine transformation.

Example:

import cv2 as cv
import matplotlib.pyplot as plt
import numpy as np

#读取图像
img1 = cv.imread('E:\\loaddown\\opencv_test\\1.jpg')

#获取仿射变换矩阵
rows, cols = img1.shape[:2]
pts1 = np.float32([[50,50], [200,50], [50,200]])
pts2 = np.float32([[100,100], [200,50], [100,250]])
M = cv.getAffineTransform(pts1, pts2)

#进行仿射变换
dst = cv.warpAffine(img1, M, (cols, rows))

#图像显示
fig,axes = plt.subplots(nrows=1, ncols=2, figsize=(10,8), dpi=100)
axes[0].imshow(img1[:,:,::-1])
axes[0].set_title("orignal")
axes[1].imshow(dst[:,:,::-1])
axes[1].set_title("affine")
plt.show()

Output result:

5. Transmission transformation

The essence of transmission transformation is to project the image to a new plane, and its general transformation formula is:

$\left [ \begin{matrix} x' & y' & z' \end{matrix} \right ] = \left [ \begin{matrix} u & v & w \end{matrix} \right ] T = \left [ \begin{matrix} u & v & w \end{matrix} \right ] \left [ \begin{matrix} T_{1} & T_{2} \\ T_{3} & a_{22} \end{matrix} \right ] = \left [ \begin{matrix} u & v & w \end{matrix} \right ]\left [ \begin{matrix} a_{00} & a_{01} & a_{02} \\ a_{10} & a_{11} & a_{12}\\ a_{20} & a_{21} & a_{22} \end{matrix} \right ]$

Among them, $\left ( u,v \right )$ is the original image pixel coordinates, $w$ the value is 1, $(x=\frac{x'}{z'}, y=\frac{y'}{z'})$ which is the result of transmission transformation, $T$ the matrix is the transmission transformation matrix, which can be divided into three parts: it $T_1$ means linear transformation of the image, $T_2$ translation of the image, $T_3$ and projection transformation of the image , $a_{22}$ generally set to 1.

In opencv, it is necessary to find four points (any three of which are not collinear) to calculate and obtain the transformation matrix $T$ , and then perform transmission transformation. Find the transformation matrix through the function cv.getPerspectiveTransform(), and perform projection transformation on cv.warpPerspective().

Example:

import cv2 as cv
import matplotlib.pyplot as plt
import numpy as np

#读取图像
img1 = cv.imread("E:\\loaddown\\opencv_test\\1.jpg")

#获取透射变换矩阵
rows,cols = img1.shape[:2]
pts1 = np.float32([[56,65], [168,52], [28,187], [189,190]])
pts2 = np.float32([[100, 145], [200, 100], [80,190], [210,200]])

T = cv.getPerspectiveTransform(pts1, pts2)
#透射变换
dst = cv.warpPerspective(img1, T, (cols, rows))
#图像显示
fig,axes = plt.subplots(nrows=1, ncols=2, figsize=(10,8), dpi=100)
axes[0].imshow(img1[:,:,::-1])
axes[0].set_title("orignal")
axes[1].imshow(dst[:,:,::-1])
axes[1].set_title("perspective")
plt.show()

Output result:

6. Image Pyramid

Image pyramid is a kind of multi-scale expression of image, and it is mainly used for image segmentation. It is an effective and conceptually simple structure to solve graphics with multiple resolutions. Image pyramids are used in machine vision and image compression. An image pyramid is a series of images with decreasing resolution arranged in a pyramid shape and derived from the same original image. It is obtained by down-sampling in steps, and the sampling is stopped until a certain termination condition is reached.

The bottom of the pyramid is a high-resolution representation of the image to be processed, and the top is a low-resolution image. The higher the level, the smaller the image and the lower the resolution.

Image pyramid is a kind of multi-scale image expression, using API:

cv.pyrUp(image)  #向上采样

cv.pyrDown(image)  #向下采样

Example:

mport cv2 as cv
import matplotlib.pyplot as plt
import numpy as np

#读取图像
img1 = cv.imread("E:\\loaddown\\opencv_test\\1.jpg")

#进行图像采样
up_img = cv.pyrUp(img1) #上采样操作
down_img = cv.pyrDown(img1) #下采样操作

#图像显示
fig,axes = plt.subplots(nrows=1, ncols=3, figsize=(10,8), dpi=100)
axes[0].imshow(img1[:,:,::-1])
axes[0].set_title("orignal")
axes[1].imshow(up_img[:,:,::-1])
axes[1].set_title("up_img")
axes[2].imshow(down_img[:,:,::-1])
axes[2].set_title("down_img")
plt.show()

Output result: