The main content of this article comes from the image processing part of OpenCV in the OpenCV-Python tutorial . The main content of this part is as follows:
-
Learn to change images between different color spaces. Also learn to track colorful objects in videos.
-
Geometric transformation of images
Learn to apply different geometric transformations to images, such as rotation, translation, etc.
-
Learn to convert images into binary images using global thresholding, adaptive thresholding, Otsu's binarization, and more.
-
Learn to blur images, filter images with custom kernels, and more.
-
Understand morphological transformations such as erosion, expansion, opening, closing, etc.
-
Learn to find image gradients, edges, and more.
-
Learn to find edges with Canny edge detection.
-
Learn about image pyramids and how to use them for image blending.
-
All about contours in OpenCV.
-
All about histograms in OpenCV.
-
Image transformation in OpenCV
Different image transformations are encountered in OpenCV, such as Fourier transform, cosine transform, etc.
-
Learn to use template matching to search for objects in images.
-
Learn to detect lines in an image.
-
Learn to detect circles in an image.
-
Image segmentation using watershed algorithm
Learn to segment images using the watershed segmentation algorithm.
-
Interactive foreground extraction using GrabCut algorithm
Learn to use GrabCut algorithm to extract foreground
Target
- Learn to apply different geometric transformations to images, such as translation, rotation, affine transformation, etc.
- We will see these functions: cv.getPerspectiveTransform .
transform
OpenCV provides two transformation functions, cv.warpAffine and cv.warpPerspective , through which we can perform all kinds of transformations. cv.warpAffine receives a 2x3 transformation matrix, while cv.warpPerspective receives a 3x3 transformation matrix as a parameter.
zoom
Scaling simply changes the size of the image. OpenCV has a function cv.resize() for completing this operation. The size of the image can be specified manually, or a scaling factor can be specified. Different interpolation methods can be used when scaling. The preferred interpolation methods for zooming out are cv.INTER_AREA and for zooming in, cv.INTER_CUBIC (slow) & cv.INTER_LINEAR . By default, the cv.INTER_LINEAR interpolation method is used for all resizing. We can change the size of an input image as follows:
def scaling():
cv.samples.addSamplesDataSearchPath("/media/data/my_multimedia/opencv-4.x/samples/data")
img = cv.imread(cv.samples.findFile('messi5.jpg'))
res = cv.resize(img, None, fx=2, fy=2, interpolation=cv.INTER_CUBIC)
cv.imshow('frame', res)
# OR
height, width = img.shape[:2]
res = cv.resize(img, (2 * width, 2 * height), interpolation=cv.INTER_CUBIC)
cv.waitKey()
cv.destroyAllWindows()
The scaling operation makes sense for many operations that take multiple images as parameters. When the input of an operation is multiple images, and there is a certain limit on the size of the input image, and the actual input image is difficult to meet this limit, scaling can help us change the size of some images to meet the target Input requirements for the operation. For example, the addition operation of multiple images, horizontal splicing and vertical splicing of images, etc.
Pan
Translation is moving the position of an object. If you know the offset in the (x,y) direction and make it ( tx t_xtx, t y t_y ty), then we can create the following transformation matrix:
[ 1 0 tx 0 1 tx ] \begin{bmatrix} 1&0&t_x \\ 0&1&t_x \end{bmatrix}[1001txtx]
d s t ( x , y ) = s r c ( M 11 ∗ x + M 12 ∗ y + M 13 , M 21 ∗ x + M 22 ∗ y + M 23 ) = s r c ( 1 ∗ x + 0 ∗ y + t x , 0 ∗ x + 1 ∗ y + t y ) dst(x,y) = src(M_{11} ∗ x+M_{12}∗ y+M_{13}, M_{21}∗ x+M_{22}∗ y+M_{23}) \\ = src(1∗x+0∗y+t_x, 0∗x+1∗y+t_y) dst(x,y)=src(M11∗x+M12∗y+M13,M21∗x+M22∗y+M23)=src(1∗x+0∗y+tx,0∗x+1∗y+ty)
We can put it into a Numpy array of type np.float32 and pass it to the cv.warpAffine() function. You can refer to the following example of moving (100,50):
def translation():
cv.samples.addSamplesDataSearchPath("/media/data/my_multimedia/opencv-4.x/samples/data")
img = cv.imread(cv.samples.findFile('messi5.jpg'))
rows, cols, _ = img.shape
M = np.float32([[1, 0, 100], [0, 1, 50]])
dst = cv.warpAffine(img, M, (cols, rows))
dst = cv.hconcat([img, dst])
cv.imshow('frame', dst)
cv.waitKey()
cv.destroyAllWindows()
The third parameter of the cv.warpAffine() function is the size of the output image, and its form should be (width, height) . Remember width = number of columns, and height = number of rows.
The result you see should look like this:
rotate
Rotating an image by angle θ can be achieved through the following transformation matrix:
M = [ cos θ − sin θ sin θ cos θ ] M = \begin{bmatrix} cosθ&−sinθ \\ sinθ&cosθ \end{bmatrix}M=[cosθsinθ−sinθcosθ]
But OpenCV provides scaled rotation with adjustable rotation center, so we can rotate at any position we like. The modified transformation matrix is given by
[ α β ( 1 − α ) ⋅ center . x − β ⋅ center . y − β α β ⋅ center . x + ( 1 − α ) ⋅ center . y ] \begin{bmatrix} α&β&(1−α)⋅center.x−β⋅center.y \\ −β&α&β⋅center.x+(1−α)⋅center.y \end{bmatrix}[a− bba( 1 − a )⋅center.x−β⋅center.yb⋅center.x+( 1 − a )⋅center.y]
Among them:
α = scale ⋅ cos θ , β = scale ⋅ sin θ α=scale⋅cosθ, \\ β=scale⋅sinθa=scale⋅cosθ,b=scale⋅sinθ
In order to obtain this rotation matrix, OpenCV provides a function, cv.getRotationMatrix2D . Check out the example below, which rotates the image 120 degrees relative to the center and magnifies it to 1.2x.
def rotation():
cv.samples.addSamplesDataSearchPath("/media/data/my_multimedia/opencv-4.x/samples/data")
img = cv.imread(cv.samples.findFile('messi5.jpg'))
rows, cols, _ = img.shape
# cols-1 and rows-1 are the coordinate limits.
M = cv.getRotationMatrix2D(((cols - 1) / 2.0, (rows - 1) / 2.0), 120, 1.2)
dst = cv.warpAffine(img, M, (cols, rows))
dst = cv.hconcat([img, dst])
cv.imshow('frame', dst)
cv.waitKey()
cv.destroyAllWindows()
Let’s take a look at the results:
radiation transformation
In affine transformation, all parallel lines in the original image remain parallel in the output image. To find the transformation matrix, we need three points in the input image and their corresponding positions in the output image. cv.getAffineTransform will create a 2x3 matrix, which will be passed to cv.warpAffine .
Checking the example below, we will also see the selected points (they are marked in green):
def affine_transformation():
img = np.zeros((512, 512, 3), np.uint8)
cv.rectangle(img, (0, 0), (512, 512), (255, 255, 255), -1)
cv.line(img, (0, 50), (512, 50), (0, 0, 0), 3)
cv.line(img, (0, 150), (512, 150), (0, 0, 0), 3)
cv.line(img, (0, 300), (512, 300), (0, 0, 0), 3)
cv.line(img, (0, 450), (512, 450), (0, 0, 0), 3)
cv.line(img, (100, 0), (100, 512), (0, 0, 0), 3)
cv.line(img, (256, 0), (256, 512), (0, 0, 0), 3)
cv.line(img, (412, 0), (412, 512), (0, 0, 0), 3)
cv.rectangle(img, (60, 170), (430, 400), (0, 0, 0), 3)
# img, center, radius, color, thickness=None
cv.circle(img, (60, 50), 8, (0, 255, 0), -1)
cv.circle(img, (280, 50), 8, (0, 255, 0), -1)
cv.circle(img, (60, 270), 8, (0, 255, 0), -1)
rows, cols, ch = img.shape
pts1 = np.float32([[50, 50], [200, 50], [50, 200]])
pts2 = np.float32([[10, 100], [200, 50], [100, 250]])
M = cv.getAffineTransform(pts1, pts2)
dst = cv.warpAffine(img, M, (cols, rows))
plt.subplot(121), plt.imshow(img), plt.title('Input')
plt.subplot(122), plt.imshow(dst), plt.title('Output')
plt.show()
if __name__ == "__main__":
affine_transformation()
You can see the following results:
perspective transformation
For perspective transformation, we need a 3x3 transformation matrix. Even after conversion, the line will remain straight. To find this transformation matrix, we need 4 points on the input image and the corresponding points on the output image. Of these 4 points, 3 should not be collinear. The transformation matrix can then be found via the function cv.getPerspectiveTransform . Then apply cv.warpPerspective with this 3x3 transformation matrix .
You can look at the following code:
def perspective_transformation():
cv.samples.addSamplesDataSearchPath("/media/data/my_multimedia/opencv-4.x/samples/data")
img = cv.imread(cv.samples.findFile('sudoku.png'))
rows, cols, ch = img.shape
pts1 = np.float32([[70, 80], [490, 70], [30, 510], [515, 515]])
pts2 = np.float32([[0, 0], [515, 0], [0, 515], [515, 515]])
M = cv.getPerspectiveTransform(pts1, pts2)
dst = cv.warpPerspective(img, M, (515, 515))
cv.line(img, (0, int(rows / 2)), (cols, int(rows / 2)), (0, 255, 0), 3)
cv.line(img, (int(cols / 2), 0), (int(cols / 2), rows), (0, 255, 0), 3)
cv.circle(img, (70, 80), 8, (0, 255, 0), -1)
cv.circle(img, (490, 70), 8, (0, 255, 0), -1)
cv.circle(img, (30, 510), 8, (0, 255, 0), -1)
cv.circle(img, (515, 515), 8, (0, 255, 0), -1)
plt.subplot(121), plt.imshow(img), plt.title('Input')
cv.line(dst, (0, int(rows / 2)), (cols, int(rows / 2)), (0, 255, 0), 3)
cv.line(dst, (int(cols / 2), 0), (int(cols / 2), rows), (0, 255, 0), 3)
plt.subplot(122), plt.imshow(dst), plt.title('Output')
plt.show()
if __name__ == "__main__":
perspective_transformation()
The final result is as shown below:
Other resources
- “Computer Vision: Algorithms and Applications”, Richard Szeliski
Reference documentation
Geometric Transformations of Images
Commonly used LaTex mathematical formulas in Markdown
Done.