Detailed explanation of image perspective transformation based on OpenCV (from theory to implementation to practice)

1. Affine transformation and perspective transformation

         I have been unable to understand the difference between the two affine transformations and perspective transformations, so I studied the details of the two transformations in detail, rewrote the formulas, and gave some of my own opinions.

1. Affine transformation

        It can be considered that affine transformation is a special case of perspective transformation .

        Affine transformation is a linear transformation between two-dimensional coordinates and two-dimensional coordinates , that is, a linear transformation involving only two-dimensional graphics in one plane .

The translation , rotation , stagger , and scaling         of graphics can be expressed by the transformation matrix of affine transformation .

        It maintains two properties of two-dimensional graphics:

       ① "Straightness": Straight lines are still straight lines after transformation. A straight line is still a straight line after being translated , rotated , staggered , and scaled .

        ② "Parallelism": Parallel lines are still parallel lines after transformation, and the position order of points on the line remains unchanged.

        The intuitive feeling is that when we drag, flip, stretch, etc. a picture on the computer, the angle of view of the picture will not change.

        Any affine transformation can be expressed as a coordinate vector multiplied by a matrix . The following are the matrix forms of several affine transformations.

        Zoom :

\begin{equation} \left[ \begin{array}{c} x'\\ y'\\ \end{array} \right ] = \left[ \begin{array}{c} T_{x}x\\ T_{y}y\\ \end{array} \right ] = \left[ \begin{array}{ccc} T_{x}& 0 \\ 0& T_{y} \\ \end{array} \right ] \left[ \begin{array}{c} x\\ y\\ \end{array} \right ] \end{equation}

        Rotation :

\begin{equation}\left[\begin{array}{c}x'\\y'\\\end{array}\right] = \left[\begin{array}{c}xcos\theta-ysin\ theta\\xsin\theta+ycos\theta\\end{array}\right] = \left[\begin{array}{cc}cos\theta& -sin\theta\\sin\theta& cos\theta\\\ end{array} \right ] \left[ \begin{array}{c} x\\ y\\ \end{array} \right ] \end{equation}

        Miscut :

\begin{equation} \left[ \begin{array}{c} x'\\ y'\\ \end{array} \right ] = \left[ \begin{array}{c} x+ytan\phi\\ y+xtan\varphi\\ \end{array} \right ] = \left[ \begin{array}{cc} 1& tan\phi \\ tan\varphi& 1 \\ \end{array} \right ] \left[ \begin{array}{c} x\\ y\\ \end{array} \right ] \end{equation}

        The above transformations can be directly transformed with only a 2x2 matrix, but translation cannot be done, because no matter how multiplied in a 2x2 matrix, a constant amount cannot be transformed. Therefore, it is necessary to change the original 2-dimensional coordinate vector into a homogeneous coordinate , that is, use a 3-dimensional vector to represent a 2-dimensional vector.

        \left[ \begin{array}{c} x\\ y\\ \end{array} \right ] => \left[ \begin{array}{c} x\\ y\\ 1\\ \end{array} \right ]

        Pan:

        \begin{equation} \left[ \begin{array}{c} x'\\ y'\\ \end{array} \right ] = \left[ \begin{array}{c} x+T_{x}\\ y+T_{y}\\ \end{array} \right ] = \left[ \begin{array}{ccc} 1& 0 &T_{x}\\ 0& 1 &T_{y}\\ \end{array} \right ] \left[ \begin{array}{c} x\\ y\\ 1\\ \end{array} \right ] \end{equation}

        After becoming homogeneous coordinates, in order to realize the transformation of the original 2x2 matrix , such as scaling, rotation, and staggeringT_{x}=T_{y}=0 , only the order is required.

        The above transformations are all linear transformations, so the affine transformation can be expressed by the following general formula , which is the common form on the Internet:

        \begin{equation} \left[ \begin{array}{c} x'\\ y'\\ \end{array} \right ] = \left[ \begin{array}{c} a_{11}x+a_{12}y+a_{13}\\ a_{21}x+a_{22}y+a_{23}\\ \end{array} \right ] = \left[ \begin{array}{ccc} a_{11}& a_{12}&a_{13}\\ a_{21}& a_{22} &a_{23}\\ \end{array} \right ] \left[ \begin{array}{c} x\\ y\\ 1\\ \end{array} \right ] \end{equation}

At this time, the transformation matrix         of the affine transformation T= \left[ \begin{array}{ccc} a_{11}& a_{12}&a_{13}\\ a_{21}& a_{22} &a_{23}\\ \end{array} \right ]is a 2x3 matrix.

        So the coordinate transformation equations are as follows:

        \begin{equation} \left\{ \begin{matrix}{} x'=a_{11}x+a_{12}y+a_{13} \\ y'=a_{21}x+a_{22}y+a_{23} \\ \end{matrix} \right. \end{equation}

        It can be seen that there are 6 unknown coefficients, which require 3 pairs of mapping points (provided they are independent of each other) to solve. It is not difficult to understand that 6 variables naturally require at least 6 equations to be calculated, and 1 pair of mapping points can provide 2 equations .

        At the same time, the three points uniquely determine a plane, and the other three mapping points must be in the same plane because of the linear transformation, so it can be said that the affine transformation is a graphic transformation in the plane.

2. Perspective transformation

        Perspective transformation is to project the picture to a new viewing plane, also known as projection mapping.

        It is a mapping from two dimensions(x,y) to three dimensions(X,Y,Z) , and then to another two-dimensional space (x',y').

        In contrast to affine transformation, it is not just a linear transformation. It provides more flexibility to map one quadrilateral region to another quadrilateral region.

        The perspective transformation is also implemented by matrix multiplication, using a 3x3 matrix, the first two rows of the matrix are the same as the affine matrix, which means that all transformations of the affine transformation perspective transformation can also be realized. The third line is used to implement the perspective transformation.

        Perspective transformations also use homogeneous coordinates to represent 2D vectors:

        \begin{equation} \left[ \begin{array}{c} x'\\ y'\\ z'\\ \end{array} \right ] = \left[ \begin{array}{ccc} a_{11}& a_{12} &a_{13}\\ a_{21}& a_{22} &a_{23}\\ a_{31}& a_{32} &a_{33}\\ \end{array} \right ] \left[ \begin{array}{c} x\\ y\\ 1\\ \end{array} \right ] \end{equation}

        

At this time, the transformation matrix         of the perspective transformation T= \left[ \begin{array}{ccc} a_{11}& a_{12} &a_{13}\\ a_{21}& a_{22} &a_{23}\\ a_{31}& a_{32} &a_{33}\\ \end{array} \right ]is a 3x3 matrix

        The perspective transformation \left[ \begin{array}{c} x'\\ y'\\ z'\\ \end{array} \right ]is not the final coordinates, and further transformation is required:

\begin{equation} \left[ \begin{array}{c} x'\\ y'\\ z'\\ \end{array} \right ] = z' \left[ \begin{array}{c} \frac{x'}{z'}\\ \frac{y'}{z'}\\ 1\\ \end{array} \right ] \end{equation}

        \left[ \begin{array}{c} \frac{x'}{z'}\\ \frac{y'}{z'}\\ 1\\ \end{array} \right ]is the final transformed coordinates, namely:

\begin{equation} \left[ \begin{array}{c} x''\\ y''\\ 1\\ \end{array} \right ] = \left[ \begin{array}{c} \frac{x'}{z'}\\ \frac{y'}{z'}\\ 1\\ \end{array} \right ] \end{equation}

        In fact, here you can understand why affine transformation is a special case of perspective transformation . Because if the coordinate vector after affine transformation is also represented by homogeneous coordinates \left[ \begin{array}{c} x'\\ y'\\ 1\\ \end{array} \right ], the conversion of 2x3 matrix to 3x3 matrix can be regarded as:

T_{affine}= \left[ \begin{array}{ccc} a_{11}& a_{12} &a_{13}\\ a_{21}& a_{22} &a_{23}\\ 0& 0 &1\\ \end{array} \right ]

        At this time, the forms of affine transformation and perspective transformation are unified, and the process of affine transformation is regarded as follows:

\begin{equation} \left[ \begin{array}{c} x'\\ y'\\ 1\\ \end{array} \right ] = \left[ \begin{array}{ccc} a_{11}& a_{12} &a_{13}\\ a_{21}& a_{22} &a_{23}\\ 0& 0 &1\\ \end{array} \right ] \left[ \begin{array}{c} x\\ y\\ 1\\ \end{array} \right ] \end{equation}

        So the affine transformation is just a special case of the third row . At this time, it has been extended to 3-dimensional vectors. In fact, we can also understand that affine transformation is an in-plane image transformation performed on the plane of the space coordinate system , that is, the value in the z direction is 1 no matter how it changes, and it is always limited to in plane.\left[ \begin{matrix} 0&0&1\\ \end{matrix} \right ]Z=1Z=1

        Going back to perspective transformation, the whole process of perspective transformation is as follows:

\begin{equation} \begin{aligned} \left[ \begin{array}{c} x''\\ y''\\ 1\\ \end{array} \right ] = \left[ \begin{array}{c} \frac{x'}{z'}\\ \frac{y'}{z'}\\ 1\\ \end{array} \right ] = \frac{1}{z'} \left[ \begin{array}{c} x'\\ y'\\ z'\\ \end{array} \right ] = \frac{1}{z'} \left[ \begin{array}{ccc} a_{11}& a_{12} &a_{13}\\ a_{21}& a_{22} &a_{23}\\ a_{31}& a_{32} &a_{33}\\ \end{array} \right ] \left[ \begin{array}{c} x\\ y\\ 1\\ \end{array} \right ] \\= \frac{1}{a_{31}x+a_{32}y+a_{33}} \left[ \begin{array}{ccc} a_{11}& a_{12} &a_{13}\\ a_{21}& a_{22} &a_{23}\\ a_{31}& a_{32} &a_{33}\\ \end{array} \right ] \left[ \begin{array}{c} x\\ y\\ 1\\ \end{array} \right ] \end{aligned} \end{equation}

        So the coordinate transformation equations are as follows:

\begin{equation} \left\{ \begin{matix}{} x''=\frac{x'}{z'}=\frac{a_{11}x+a_{12}y+a_{13}}{a_{31}x+a_{32}y+a_{33}} \\ y''=\frac{y'}{z'}=\frac{a_{21}x+a_{22}y+a_{23}}{a_{31}x+a_{32}y+a_{33}} \\ \end{matix} \right. \end{equation}

        There are 9 unknown parameters in total, which can be further simplified to 8 unknown parameters, which is the common form on the Internet.

        My idea here is to prove that the transformation represented by the transformation matrix T= \left[ \begin{array}{ccc} a_{11}& a_{12} &a_{13}\\ a_{21}& a_{22} &a_{23}\\ a_{31}& a_{32} &a_{33}\\ \end{array} \right ]and the transformation matrix (k is a constant) are equivalent.T'=kT= \left[ \begin{array}{ccc} to_{11}& to_{12} &to_{13}\\ to_{21}& to_{22} &to_{23}\\ to_{31} &to_{32} &to_{33}\\\end{array}\right]

        Set the transformation matrix as T', and substitute into the operation:

\begin{equation} \left[ \begin{array}{c} x'\\ y'\\ z'\\ \end{array} \right ] =T' \left[ \begin{array}{c} x\\ y\\ 1\\ \end{array} \right ] =kT \left[ \begin{array}{c} x\\ y\\ 1\\ \end{array} \right ] = \left[ \begin{array}{ccc} ka_{11}& ka_{12} &ka_{13}\\ ka_{21}& ka_{22} &ka_{23}\\ ka_{31}& ka_{32} &ka_{33}\\ \end{array} \right ] \left[ \begin{array}{c} x\\ y\\ 1\\ \end{array} \right ] \end{equation}

        The equations are obtained as follows:

\begin{equation} \left\{ \begin{matix}{} x''=\frac{k(a_{11}x+a_{12}y+a_{13})}{k(a_{31}x+a_{32}y+a_{33})}=\frac{a_{11}x+a_{12}y+a_{13}}{a_{31}x+a_{32}y+a_{33}} \\ y''=\frac{k(a_{21}x+a_{22}y+a_{23})}{k(a_{31}x+a_{32}y+a_{33})}=\frac{a_{21}x+a_{22}y+a_{23}}{a_{31}x+a_{32}y+a_{33}} \\ \end{matix} \right. \end{equation}

        It can be found that the final T'calculation is the same as the result calculated x'',y''using the transformation matrix .T

        Therefore for T= \left[ \begin{array}{ccc} a_{11}& a_{12} &a_{13}\\ a_{21}& a_{22} &a_{23}\\ a_{31}& a_{32} &a_{33}\\ \end{array} \right ], we can always be equivalent to \frac{T}{a_{33}}= \left[ \begin{array}{ccc} \frac{a_{11}}{a_{33}}& \frac{a_{12}}{a_{33}} &\frac{a_{13}}{a_{33}}\\ \frac{a_{21}}{a_{33}}& \frac{a_{22}}{a_{33}} &\frac{a_{23}}{a_{33}}\\ \frac{a_{31}}{a_{33}}& \frac{a_{32}}{a_{33}} &1\\ \end{array} \right ]the transformation brought about by .

        The transformation matrix T can be obtained by renaming the above variables:

T= \left[ \begin{array}{ccc} b_{11}& b_{12} &b_{13}\\ b_{21}& b_{22} &b_{23}\\ b_{31}& b_{32} &1\\ \end{array} \right ]

        At this time, there are only 8 unknown parameters, and the transformation equations also become as follows:

\begin{equation} \left\{ \begin{matix}{} x''=\frac{x'}{z'}=\frac{b_{11}x+b_{12}y+b_{13}}{b_{31}x+b_{32}y+1} \\ y''=\frac{y'}{z'}=\frac{b_{21}x+b_{22}y+b_{23}}{b_{31}x+b_{32}y+1} \\ \end{array} \right. \end{equation}

        Solving 8 unknowns requires 8 equations, and 1 set of mapping points provides 2 equations, so 4 sets of mapping points need to be found, which is why we need to provide 4 points before and after the transformation to represent the perspective transformation.

        So far we can also understand why perspective transformation is a process from two-dimensional to three-dimensional, and from three-dimensional to two-dimensional. The point before the transformation is \left[ \begin{array}{c} x\\ y\\ 1\\ \end{array} \right ], which is a point in the three-dimensional space , but we recognize that its projection on the two-dimensional plane is . The points in the three-dimensional space are transformed by matrix , and then the points in the three-dimensional space are obtained by dividing by the value dimensional space, and finally the points are obtained by projecting onto the two-dimensional plane . The whole process is to convert two-dimensional to three-dimensional, and then turn and map back to the previous two-dimensional space.Z=1(x,y)\left[ \begin{array}{c} x'\\ y'\\ z'\\ \end{array} \right ]z'\left[ \begin{array}{c} \frac{x'}{z'}\\ \frac{y'}{z'}\\ 1\\ \end{array} \right ]Z=1(x',y')

The effect of         perspective transformation is equivalent to the change in the picture observed when the viewer's perspective changes.

Why is this perspective change possible         through a matrix ? I couldn't find the specific mathematical principles on the Internet, and most of the articles just stopped at the transformation matrixT , and there was no in-depth explanation of the specific reasons, which troubled me for a long time.

        After deduction, I think the process of perspective transformation should be as follows:

        First, the observation point is located at the origin , and then looking in the positive direction(0,0,0) of the z-axis , the projection plane is where we see the object on the display.(x,y,1)

        After Tchanging the perspective transformation matrix, the original (x,y,1)becomes (x',y',z')(it can be proved that the transformed point is still on the same plane in space ). At this time, the coordinates are not only on Z=1the plane, but in the entire three-dimensional space, that is, the matrix Twill Z=1transform the original graphics in the plane into the graphics of a certain plane in the space. This is why the transformation matrix changes the viewing angle of the picture.

        Then each point of the graph is connected with the viewpoint (ie, the origin)Z=1 , and is projected on the projection surface to form a graph (x'',y''). The numerical performance is that the three coordinate values ​​are divided by z'. The reason for this is actually the geometric scaling:

\frac{0-1}{0-z'}=\frac{0-x''}{0-x'}=\frac{0-y''}{0-y'}

        This also solves my long-standing confusion, that is, the change in the perspective of our picture caused by perspective transformation is not the change of the perspective of our observer , but the change of the spatial position of the object due to the constant perspective of the observer , resulting in what we see Picture perspective changes. This can be understood as: it is not that we see the angle of the object change when we walk around, but that the person remains still, and other people move the object so that we see that the angle of the object changes.

2. Realization of Perspective Transformation

        In the application, two functions of OpenCV, getPerspectiveTransform() and warpPerspective(), need to be used .

1.getPerspectiveTransform()

Mat getPerspectiveTransform(InputArray src, InputArray dst, int solveMethod = DECOMP_LU)

① Function description

Calculate the transformation matrix of the perspective transformation         from 4 pairs of mapping points , and the returned matrix data type is Mat . It should be noted here that 1 pair of mapping points refers to and , but the transformation matrix acts on and . Right now:T\left[ \begin{array}{c} x\\ y\\ 1\\ \end{array} \right ]\left[ \begin{array}{c} x''\\ y''\\ 1\\ \end{array} \right ]T\left[ \begin{array}{c} x\\ y\\ 1\\ \end{array} \right ]\left[ \begin{array}{c} x'\\ y'\\ z'\\ \end{array} \right ]

\left[ \begin{array}{c} x'\\ y'\\ z'\\ \end{array} \right ] = T \left[ \begin{array}{c} x\\ y\\ 1\\ \end{array} \right ]

②Parameter description

        Parameter src : 4 vertex coordinates of the quadrilateral of the source image .

        Parameter dst : The target image corresponds to the 4 vertex coordinates of the quadrilateral .

        Parameter solveMethod : The calculation method passed to cv::solve(#DecompTypes), the default is DECOMP_LU , generally do not need to input this parameter.

        Return value : Mat-type transformation matrix, which can be directly used in the warpPerspective() function

2.warpPerspective()     

void warpPerspective(
	InputArray src,
	OutputArray dst,
	InputArray M,
	Size dsize,
	int flags=INTER_LINEAR,
	int borderMode = BORDER_CONSTANT, 
	const Scalar& borderValue = Scalar());

① Function description

        Applies a transformation matrix Tto the original image to make its perspective transformation into the destination image.

②Parameter description

        Parameter src : input image.

        Parameter dst : Output image, need to initialize an empty matrix to save the result, no need to set the matrix size.

        Parameter M : 3x3 transformation matrix.

        Parameter dsize : The size of the output image.

        Parameter flags : Set the interpolation method. The default is INTER_LINEAR for bilinear interpolation, INTER_NEAREST for nearest neighbor interpolation, WARP_INVERSE_MAP for M as an inverse transformation (dst->src).

        Parameters borderMode : pixel extrapolation method, the default is BORDER_CONSTANT, specify constant padding. Looking through the official documentation, I found another option is BORDER_REPLICATE .

        Parameter borderValue : The color setting of the border when the constant is filled, the default is (0,0,0), which means black . That's why the picture is surrounded by black after the perspective transformation. Note here that the type is Scalar (B, G, R) .

3. Use of functions

        code show as below:

#include <iostream>
#include<opencv2/opencv.hpp>
using namespace std;
using namespace cv;

void main()
{
    Mat img = imread("test.png");  
    Point2f AffinePoints0[4] = { Point2f(0, 0), Point2f(img.cols, 0), Point2f(0, img.rows), Point2f(img.cols, img.rows) };//变化前的4个节点
    Point2f AffinePoints1[4] = { Point2f(100, 0), Point2f(img.cols - 100, 0),Point2f(0, img.rows), Point2f(img.cols, img.rows) };//变化后的4个节点

    Mat Trans = getPerspectiveTransform(AffinePoints0, AffinePoints1);//由4组映射点得到变换矩阵
    Mat dst_perspective;//存储目标透视图像
    warpPerspective(img, dst_perspective, Trans, Size(img.cols, img.rows));//执行透视变换

    imshow("原图像", img);
    imshow("透视变换后", dst_perspective);
    waitKey();
}

        The execution results are as follows:

        It can be seen that the original effect is to look up at the computer screen from the front, but now the perspective transformation is to look up at the computer screen (or the computer screen is tilted backward).

4. Map point markers

        In order to know the mapping points before and after the transformation more clearly, we can mark them on the graph. Here we use the circle function of OpenCV.

① Function prototype

        void circle()

circle (
	InputOutputArray img, 
	Point center, 
	int radius, 
	const Scalar &color, 
	int thickness=1, 
	int lineType=LINE_8, 
	int shift=0)

② Function function

        Draws a hollow or filled circle with the given center and radius on the image.

③Function parameters

        Parameter img : the image drawn by the circle.

        Parameter center : the coordinates of the center of the circle, the type is Scalar(x, y) .

        Parameter radius : The radius of the circle.

        Parameter color : the color of the circle, the rule is (B,G,R), and the type is Scalar(B,G,R) .

        Parameter thickness : If a positive number indicates the thickness of the lines that make up the circle . If it is a negative number, it means whether the circle is filled, such as FILLED means to draw a solid circle.

        Parameter line_type : the type of line, the default is LINE_8 .

        Parameter shift : the number of decimal places of the center coordinate point and radius value, the default is 0 .

④ Code process

    for (int i = 0; i < 4; i++)//显示4组映射点的位置
    {
        //画一个圆心在映射点(转换前),半径为10,线条粗细为3,红色的圆
        circle(img, AffinePoints0[i], 10, Scalar(59, 23, 232), 3);
        //画一个圆心在映射点(转换后),半径为10,线条粗细为3,蓝色的圆
        circle(dst_perspective, AffinePoints1[i], 10, Scalar(139, 0, 0), 3);
    }

        Execution effect:

        You can see that the 4 sets of points before and after the conversion are all drawn, because I set the mapping points at the corners, so only part of the circle can be seen. The entire circle can be seen when set in the middle of the picture.

5. Implementation of library functions

Generally, the perspective transformation can be realized by directly calling the above functions. Here, in order to better understand the transformation process, I have implemented the perspective transformation function         in the above two functions (the function of finding the transformation matrix is ​​a purely mathematical problem, which will not be repeated here) , which may differ from the official library function implementation.

        The realization idea of ​​warpPerspective is:

        Given the transformation matrix and the source image , the transformation matrix needs to be used to perspective transform the source image into the target image . This process is not the forward process we understand above , that is, the transformation matrix T is multiplied by all the coordinates of the original image \left[ \begin{array}{c} x\\ y\\ 1\\ \end{array} \right ]to obtain the coordinates of the target image. Because this will cause the individual coordinates on the target image \left[ \begin{array}{c} x''\\ y''\\ 1\\ \end{array} \right ]to not be mapped. We use the reverse process , \left[ \begin{array}{c} x''\\ y''\\ 1\\ \end{array} \right ]mapping back to the original image to find the target pixel value. The derivation process of reverse mapping is as follows:

        A known:

\begin{equation} \left[ \begin{array}{c} x'\\ y'\\ z'\\ \end{array} \right ] = T \left[ \begin{array}{c} x\\ y\\ 1\\ \end{array} \right ] \end{equation}

        Divide with z':

\begin{equation} \left[ \begin{array}{c} \frac{x'}{z'}\\ \frac{y'}{z'}\\ 1\\ \end{array} \right ] = T \left[ \begin{array}{c} \frac{x}{z'}\\ \frac{y}{z'}\\ \frac{1}{z'}\\ \end{array} \right ] \end{equation}

        Multiply with T^{-1}:

\begin{equation} T^{-1} \left[ \begin{array}{c} \frac{x'}{z'}\\ \frac{y'}{z'}\\ 1\\ \end{array} \right ] = \left[ \begin{array}{c} \frac{x}{z'}\\ \frac{y}{z'}\\ \frac{1}{z'}\\ \end{array} \right ] \end{equation}

        Right now:

\begin{equation} T^{-1} \left[ \begin{array}{c} x''\\ y''\\ 1\\ \end{array} \right ] = \left[ \begin{array}{c} \frac{x}{z'}\\ \frac{y}{z'}\\ \frac{1}{z'}\\ \end{array} \right ] \end{equation}

        Suppose the inverse of the obtained transformation matrix T T^{-1}is:

T^{-1}= \left[ \begin{array}{ccc} c_{11}& c_{12} &c_{13}\\ c_{21}& c_{22} &c_{23}\\ c_{31}& c_{32} &c_{33}\\ \end{array} \right ]

        After expansion, you can get:

\begin{equation} \left\{ \begin{matrix}{} x=(c_{11}x''+c_{12}y''+c_{13})z' \\ y=(c_{21}x''+c_{22}y''+c_{23})z'\\ z'=\frac{1}{c_{31}x''+c_{32}y''+c_{33}}\\ \end{array} \right. \end{equation}

        This formula shows that after we obtain the inverse of the transformation matrix T T^{-1}, for any coordinate on the target image(x'',y'') , we can use the above formula to obtain the position corresponding to the coordinate on the original image(x,y) , and then obtain the pixel value at this position px(x,y). Of course, the positions don't have to be integers, interpolation may be required.

        The code implementation process is as follows:

//自己实现的wrapPerspective函数
void _wrapPerspective(const Mat& src, const Mat& T, Mat& dst)//src为源图像,T为变换矩阵,dst为目标图像
{
    dst.create(src.size(), src.type());//创建一个和原图像一样大小的Mat
    Mat T_inverse;//变换矩阵的逆
    invert(T, T_inverse);//求矩阵T的逆,结果存到T_inverse
    //取出矩阵中的值
    double c11 = T_inverse.ptr<double>(0)[0];
    double c12 = T_inverse.ptr<double>(0)[1];
    double c13 = T_inverse.ptr<double>(0)[2];
    double c21 = T_inverse.ptr<double>(1)[0];
    double c22 = T_inverse.ptr<double>(1)[1];
    double c23 = T_inverse.ptr<double>(1)[2];
    double c31 = T_inverse.ptr<double>(2)[0];
    double c32 = T_inverse.ptr<double>(2)[1];
    double c33 = T_inverse.ptr<double>(2)[2];
    //遍历目标图像的每个位置,求取原图像对应位置的像素值
    
    for (int y = 0; y < dst.rows; y++)
    {
        for (int x = 0; x < dst.cols; x++)
        {
            double xp = c11 * x + c12 * y + c13;
            double yp = c21 * x + c22 * y + c23;
            double z = c31 * x + c32 * y + c33;//z'
            z = z ? 1.0 / z : 0;//z'不为0时求导数,否则设为0
            xp *= z;
            yp *= z;
            //将双精度坐标限制在整型能表示的最大最小值之间
            double fx = max((double)INT_MIN, min((double)INT_MAX, xp));
            double fy = max((double)INT_MIN, min((double)INT_MAX, yp));
            //转化为int,这里简单地使用了最近邻插值
            int X = saturate_cast<int>(fx);
            int Y = saturate_cast<int>(fy);
            //是否在原图像大小范围内
            if (X >= 0 && X < src.cols && Y >= 0 && Y < src.cols)
            {
                dst.at<Vec3b>(y, x)[0] = src.at<Vec3b>(Y, X)[0];
                dst.at<Vec3b>(y, x)[1] = src.at<Vec3b>(Y, X)[1];
                dst.at<Vec3b>(y, x)[2] = src.at<Vec3b>(Y, X)[2];
            }
            else//以黑色填充
            {
                dst.at<Vec3b>(y, x)[0] = 0;
                dst.at<Vec3b>(y, x)[1] = 0;
                dst.at<Vec3b>(y, x)[2] = 0;
            }
        }
    }
}

        running result:

        Compared with the official function implementation results, it can be found that they are basically consistent, indicating that the implementation idea is correct.

Third, the application of perspective transformation

1. Interactive program for perspective transformation

①Experimental requirements:

        Design an interactive program that can edit the vertices of the quadrilateral , and the result of image deformation can be updated in real time when the position of the vertices changes .

② Experiment ideas:

        According to the knowledge mentioned above, it is very easy to realize the perspective transformation of the known 4 pairs of mapping points . Therefore, the difficulty of the problem lies in how to design an interactive program, where the mouse click event of OpenCV needs to be used.

void setMousecallback(const string& winname, MouseCallback onMouse, void* userdata=0)

Parameter Description:

        winname : The name of the window.

        onMouse : Mouse response function or callback function. Specifies the function pointer to be called every time a mouse event occurs in the window. The prototype of this function is void on_Mouse(int event, int x, int y, int flags, void* param) .

        userdata : The parameter passed to the callback function, the default is 0 . I personally have not used this parameter.

void MouseCallback(int event,int x,int y,int flags,void *useradata);

Parameter Description:

        event : mouse event.

        x : The x coordinate of the mouse event.

        y : The y coordinate of the mouse event.

        flags : Represents mouse drag events and keyboard and mouse combined events.

        userdata : optional parameter, not used so far.

There are mainly the following types         of mouse events :

                EVENT_ MOUSEMOVE : mouse movement

                EVENT_ LBUTTONDOWN : left mouse button down

                EVENT_ RBUTTONDOWN : Right mouse button down

                EVENT_ MBUTTONDOWN : Middle mouse button down

                EVENT_ LBUTTONUP : Release the left mouse button

                EVENT_ RBUTTONUP : Release the right mouse button

                EVENT_ MBUTTONUP : Release the middle button

                EVENT_ LBUTTONDBLCLK : Left double click

                EVENT_ RBUTTONDBLCLK : Right double click

                EVENT_ MBUTTONDBLCLK : Middle button double click

There are mainly the following types         of Flags :

                EVENT_FLAG_ LBUTTON : left click and drag

                EVENT_FLAG_ RBUTTON : right click and drag

                EVENT_FLAG_ MBUTTON : middle button dragging

                EVENT_FLAG_ CTRLKEY : Ctrl is pressed and held

                EVENT_FLAG_ SHIFTKEY : Shift is pressed and held

                EVENT_FLAG_ALTKEY : alt is pressed and held

general idea:

        For a better look and feel, we need a canvas slightly larger than the original image to put the target image on. The initial 4 mapping points are on the 4 corners of the original image . On this canvas, when the mouse moves to the circular range of the 4 mapping points , if the left button is pressed and not released at this time , the mapping point can be dragged . When the left mouse button is released , the position of the mouse is the position after the moving of the mapping point. Then recalculate the transformation matrix, implement perspective transformation on the original image and display it.

        Of course, real-time update can also be understood as calculating the transformation matrix and realizing the perspective transformation when the position of the mapping point changes when the left button is dragged. It is clear to move the mapping point to a certain target position, but unnecessary perspective transformation will be performed during the movement.

③ Realization process

        The focus is on the writing of mouse events, and other processes are similar to the previous perspective transformation method.

        When the left button is pressed, it is necessary to determine whether the click is in the area of ​​the mapping point. If the click is in the effective area, it is also necessary to record which mapping point was clicked.

        When the left button is pressed and dragged, if the perspective transformation needs to be updated in real time during the dragging process, then the position of the mapping point is recorded in real time and the perspective transformation is performed. In my implementation method here, I choose to perform perspective transformation after releasing the left button, so there is no need to record the position of the mapping point when the left button is dragged. But for a better interactive effect, I still record the position and draw some straight lines and circles for interactive prompts.

        When the left button is released, perform perspective transformation.

The specific code is as follows:

void mouseHander(int event, int x, int y, int flags, void* p)
{
    if (event == EVENT_LBUTTONDOWN)//左键按下
    {
        for (int i = 0; i < 4; i++)
        {
            //判断是否选择了某个映射点
            if (abs(x - dstPoint[i].x) <= radius && abs(y - dstPoint[i].y) <= radius)
            {
                pickPoint = i;
                beforePlace = dstPoint[pickPoint];//记录原本的位置
                break;
            }
        }
    }
    else if (event == EVENT_MOUSEMOVE && pickPoint >= 0 )//左键按下后选取了某个点且拖拽
    {
        //更改映射后坐标
        dstPoint[pickPoint].x = x, dstPoint[pickPoint].y = y;
        //在临时图像上实时显示鼠标拖动时形成的图像
        //注意不能直接在dstImg上画,否则会画很多次
        Mat tmp = dstImg.clone();
        //原本的圆
        circle(tmp, beforePlace, radius, Scalar(228, 164, 140), -1);
        //绘制直线
        line(tmp, dstPoint[0], dstPoint[1], Scalar(246, 230, 171), 5, 8);
        line(tmp, dstPoint[1], dstPoint[2], Scalar(246, 230, 171), 5, 8);
        line(tmp, dstPoint[2], dstPoint[3], Scalar(246, 230, 171), 5, 8);
        line(tmp, dstPoint[3], dstPoint[0], Scalar(246, 230, 171), 5, 8);
        //重新绘制4个圆
        for (int i = 0; i < 4; i++)
        {
            if (i != pickPoint)
                circle(tmp, dstPoint[i], radius, Scalar(228, 164, 140), -1);
            else
                circle(tmp, dstPoint[i], radius, Scalar(96, 96, 240), -1);
        }
        imshow("透视变换后", tmp);
    }
    else if (event == EVENT_LBUTTONUP && pickPoint >= 0)//左键松开
    {
        //执行透视变换
        Mat Trans = getPerspectiveTransform(srcPoint, dstPoint);//由4组映射点得到变换矩阵
        warpPerspective(srcImg, dstImg, Trans, Size(dstImg.cols, dstImg.rows));//执行透视变换
        for (int i = 0; i < 4; i++)//显示4组映射点的位置
        {
            //画一个圆心在映射点(转换后),半径为10,线条粗细为3,黄色的圆
            circle(dstImg, dstPoint[i], radius, Scalar(0, 215, 255), 3);
        }
        imshow("透视变换后", dstImg);
        pickPoint = -1;//重置选取状态
    }
}

④Run results

        Then test different pictures:

        It can be seen that after the perspective transformation of the poster on the opposite road, the angle of view we see changes from looking down on the road to looking down on the road.

        Today, I took a photo of the library casually, which was taken from the left side of the library, and its perspective transformation:        

         It can be seen that after the adjustment, the viewing angle is closer to the front.

⑤ source code

#include <iostream>
#include<opencv2/opencv.hpp>
using namespace std;
using namespace cv;

Mat srcImg, dstImg;//原图像、目标图像
Point2f srcPoint[4], dstPoint[4];//原图像和目标图像的4个映射点
Point2f beforePlace;//记录移动映射点之前的位置
int radius;//映射点的判定半径
int pickPoint;//记录点击了哪个点

void mouseHander(int event, int x, int y, int flags, void* p)
{
    if (event == EVENT_LBUTTONDOWN)//左键按下
    {
        for (int i = 0; i < 4; i++)
        {
            //判断是否选择了某个映射点
            if (abs(x - dstPoint[i].x) <= radius && abs(y - dstPoint[i].y) <= radius)
            {
                pickPoint = i;
                beforePlace = dstPoint[pickPoint];//记录原本的位置
                break;
            }
        }
    }
    else if (event == EVENT_MOUSEMOVE && pickPoint >= 0 )//左键按下后选取了某个点且拖拽
    {
        //更改映射后坐标
        dstPoint[pickPoint].x = x, dstPoint[pickPoint].y = y;
        //在临时图像上实时显示鼠标拖动时形成的图像
        //注意不能直接在dstImg上画,否则会画很多次
        Mat tmp = dstImg.clone();
        //原本的圆
        circle(tmp, beforePlace, radius, Scalar(228, 164, 140), -1);
        //绘制直线
        line(tmp, dstPoint[0], dstPoint[1], Scalar(246, 230, 171), 5, 8);
        line(tmp, dstPoint[1], dstPoint[2], Scalar(246, 230, 171), 5, 8);
        line(tmp, dstPoint[2], dstPoint[3], Scalar(246, 230, 171), 5, 8);
        line(tmp, dstPoint[3], dstPoint[0], Scalar(246, 230, 171), 5, 8);
        //重新绘制4个圆
        for (int i = 0; i < 4; i++)
        {
            if (i != pickPoint)
                circle(tmp, dstPoint[i], radius, Scalar(228, 164, 140), -1);
            else
                circle(tmp, dstPoint[i], radius, Scalar(96, 96, 240), -1);
        }
        imshow("透视变换后", tmp);
    }
    else if (event == EVENT_LBUTTONUP && pickPoint >= 0)//左键松开
    {
        //执行透视变换
        Mat Trans = getPerspectiveTransform(srcPoint, dstPoint);//由4组映射点得到变换矩阵
        warpPerspective(srcImg, dstImg, Trans, Size(dstImg.cols, dstImg.rows));//执行透视变换
        for (int i = 0; i < 4; i++)//显示4组映射点的位置
        {
            //画一个圆心在映射点(转换后),半径为10,线条粗细为3,黄色的圆
            circle(dstImg, dstPoint[i], radius, Scalar(0, 215, 255), 3);
        }
        imshow("透视变换后", dstImg);
        pickPoint = -1;//重置选取状态
    }
}

void main()
{
    srcImg = imread("library.jpg");  
    radius = 10;//设置四个点的圆的半径
    pickPoint = -1;
    //映射前的4个点
    srcPoint[0] = Point2f(0, 0);
    srcPoint[1] = Point2f(srcImg.cols, 0);
    srcPoint[2] = Point2f(srcImg.cols, srcImg.rows);
    srcPoint[3] = Point2f(0, srcImg.rows);
    //创建一张略大于原图像的画布
    dstImg = Mat::zeros(Size(2 * radius + 100 + srcImg.cols, 2 * radius + 100 + srcImg.rows), srcImg.type());
    //初始映射后的4个点
    dstPoint[0] = Point2f(radius + 50, radius + 50);
    dstPoint[1] = Point2f(radius + 50 + srcImg.cols, radius + 50);
    dstPoint[2] = Point2f(radius + 50 + srcImg.cols, radius + 50 + srcImg.rows);
    dstPoint[3] = Point2f(radius + 50, radius + 50 + srcImg.rows);
    Mat Trans = getPerspectiveTransform(srcPoint, dstPoint);//由4组映射点得到变换矩阵
    Mat dst_perspective;//存储目标透视图像
    warpPerspective(srcImg, dstImg, Trans, Size(dstImg.cols, dstImg.rows));//执行透视变换

    for (int i = 0; i < 4; i++)//显示初始4组映射点的位置
    {
        //画一个圆心在映射点(转换后),半径为10,线条粗细为3,黄色的圆
        circle(dstImg, dstPoint[i], radius, Scalar(95, 180, 243), 3);
    }
    imshow("原图像", srcImg);
    imshow("透视变换后", dstImg);
    //鼠标事件
    setMouseCallback("透视变换后", mouseHander);
    waitKey();
}

2. Realization of virtual billboard

        According to the perspective transformation process above, we can think of an application: transform picture A as a billboard perspective to a specific position of picture B.

        The implementation process is also very easy. We set the initial mapping points of picture A on the four corners, then select the 4 mapping points after perspective transformation on the background picture B, and then perspective-transform picture A to the specified position (set picture B The original location of the overwrite). It should be noted here that the selection of mapping points needs to specify a fixed order, otherwise it cannot correspond to the original mapping points one by one. My default order of selecting points is to select clockwise from the upper left corner.

#include <iostream>
#include<opencv2/opencv.hpp>
using namespace std;
using namespace cv;

Mat srcImg, dstImg;//原图像、目标图像
Mat resultImg;//结果图像
vector<Point2f> srcPoints, dstPoints;//原图像和目标图像的映射点
int pickNums;//目前已选取的节点

void mouseHander(int event, int x, int y, int flags, void* p)
{
    if (event == EVENT_LBUTTONDOWN)//左键按下
    {
        Mat tmp = dstImg.clone();
        if (pickNums == 4)//选取的点超过4个后,下一次点击会实现透视变换
        {
            //执行透视变换
            Mat Trans = getPerspectiveTransform(srcPoints, dstPoints);//由4组映射点得到变换矩阵
            warpPerspective(srcImg, tmp, Trans, Size(tmp.cols, tmp.rows));//执行透视变换
            resultImg = dstImg.clone();
            for (int y = 0; y < dstImg.rows; y++)
            {
                for (int x = 0; x < dstImg.cols; x++)
                {
                    if ((int)tmp.at<Vec3b>(y, x)[0] == 0 && (int)tmp.at<Vec3b>(y, x)[1] == 0 && (int)tmp.at<Vec3b>(y, x)[2] == 0)//像素点全0
                        continue;
                    else//非全0
                    {
                        resultImg.at<Vec3b>(y, x)[0] = tmp.at<Vec3b>(y, x)[0];
                        resultImg.at<Vec3b>(y, x)[1] = tmp.at<Vec3b>(y, x)[1];
                        resultImg.at<Vec3b>(y, x)[2] = tmp.at<Vec3b>(y, x)[2];
                    }
                }
            }
            imshow("虚拟广告牌", resultImg);
            dstPoints.clear();
            pickNums = 0;
        }
        else//选取的节点还没4个
        {
            dstPoints.push_back(Point2f(x, y));
            pickNums++;
            for (int i = 0; i < dstPoints.size(); i++)
            {
                circle(tmp, dstPoints[i], 5, Scalar(0, 215, 255), 3);
            }
            imshow("虚拟广告牌", tmp);
        }
    }

}

int main()
{
    srcImg = imread("test.png");//透视变换的图像,也就是广告图

    //设置原图像的4个映射点
    srcPoints.push_back(Point2f(0, 0));
    srcPoints.push_back(Point2f(srcImg.cols, 0));
    srcPoints.push_back(Point2f(srcImg.cols, srcImg.rows));
    srcPoints.push_back(Point2f(0, srcImg.rows));

    dstImg = imread("library.jpg");//背景图

    imshow("虚拟广告牌", dstImg);
    //鼠标事件
    setMouseCallback("虚拟广告牌", mouseHander);
    waitKey(0);
    return 0;
}

        The idea is relatively simple, that is, select 4 points each time, and then map the perspective transformation of the billboard picture to the corresponding position after selection.

        Because the perspective-transformed image fills the edges with black and all 0s, the original background image is used for (0,0,0), and the perspective-transformed image is used for non-(0,0,0). This has a problem that if there are pixels in the billboard that are (0,0,0), it will not cover the background image, but after testing, it is found that there are fewer (0,0,0) pixels in the general image, or we can also The pixel value in the original image (0,0,0) can be uniformly added with a small offset such as (0,0,1), which has little effect on the whole.

        Realize the effect:

         We can also be used to turn classmates' computer screens into photos you want:

        In order to realize a dynamic billboard, the modification we need to do is to replace the billboard picture with a video file, that is, read each frame from the video file for perspective transformation.

        Here is the changed code:

#include <iostream>
#include<opencv2/opencv.hpp>
using namespace std;
using namespace cv;

Mat dstImg, frame;//原图像、目标图像,视频帧
Mat resultImg;//结果图像
vector<Point2f> srcPoints, dstPoints;//原图像和目标图像的映射点
int pickNums;//目前已选取的节点

void mouseHander(int event, int x, int y, int flags, void* p)
{
	if (event == EVENT_LBUTTONDOWN)//左键按下
	{
		Mat tmp = dstImg.clone();
		if (pickNums == 4)//选取的点超过4个后,下一次点击会实现透视变换
		{
			//打开视频文件
			VideoCapture capture;
			capture.open("sdu_cut.mp4");
			if (!capture.isOpened())
			{
				cout << "无法打开视频文件!" << endl;
			}
			int num = 0;
			while (capture.read(frame))
			{
				num++;
				if (num == 1)//第一帧
				{
					//设置原图像的4个映射点
					srcPoints.push_back(Point2f(0, 0));
					srcPoints.push_back(Point2f(frame.cols, 0));
					srcPoints.push_back(Point2f(frame.cols, frame.rows));
					srcPoints.push_back(Point2f(0, frame.rows));
				}
				//执行透视变换
				Mat Trans = getPerspectiveTransform(srcPoints, dstPoints);//由4组映射点得到变换矩阵
				warpPerspective(frame, tmp, Trans, Size(tmp.cols, tmp.rows));//执行透视变换
				resultImg = dstImg.clone();
				for (int y = 0; y < dstImg.rows; y++)
				{
					for (int x = 0; x < dstImg.cols; x++)
					{
						if ((int)tmp.at<Vec3b>(y, x)[0] == 0 && (int)tmp.at<Vec3b>(y, x)[1] == 0 && (int)tmp.at<Vec3b>(y, x)[2] == 0)//像素点全0
							continue;
						else//非全0
						{
							resultImg.at<Vec3b>(y, x)[0] = tmp.at<Vec3b>(y, x)[0];
							resultImg.at<Vec3b>(y, x)[1] = tmp.at<Vec3b>(y, x)[1];
							resultImg.at<Vec3b>(y, x)[2] = tmp.at<Vec3b>(y, x)[2];
						}
					}
				}
				imshow("虚拟广告牌", resultImg);

				//中途退出
				char c = waitKey(1);
				if (c == 27)
				{
					break;
				}

			}
			dstPoints.clear();
			srcPoints.clear();
			pickNums = 0;
		}
		else//选取的节点还没4个
		{
			dstPoints.push_back(Point2f(x, y));
			pickNums++;
			for (int i = 0; i < dstPoints.size(); i++)
			{
				circle(tmp, dstPoints[i], 5, Scalar(0, 215, 255), 3);
			}
			imshow("虚拟广告牌", tmp);
		}
	}
}

int main()
{
	dstImg = imread("b.jpg");//背景图
	imshow("虚拟广告牌", dstImg);
	//鼠标事件
	setMouseCallback("虚拟广告牌", mouseHander);
	waitKey(0);
	return 0;
}

        The idea is basically the same as that of static billboards, mainly because it has changed from reading ordinary pictures to reading video files frame by frame. In order to obtain the initial mapping points of the video file, it is necessary to read the first frame to determine the mapping point positions of the four corners.

        After the above code selects 4 points, it needs to wait until the video file is played before continuing to start a new round of selection, otherwise an error will be reported.

        The following are some experimental results. Because the video file is too large to be uploaded, only part of the content is intercepted and converted to gif display:

Guess you like

Origin blog.csdn.net/m0_51653200/article/details/127361624