OpenCV2: Geometric transformation of images, translation, mirroring, scaling, rotation (2)

This article is reproduced from: https://www.cnblogs.com/wangguchangqing/p/4045150.html Author: wangguchangqing Please indicate the statement when reprinting.

In OpenCV2: Image Geometric Transformation, Translation, Mirroring, Scaling, Rotation (1) It mainly introduces the forward mapping, backward mapping, and interpolation algorithm of floating-point coordinate pixel values in the image transformation process, and is implemented based on OpenCV2 There are two simple geometric transformations: translation and mirror transformation. This article mainly consists of two geometric transformations that are slightly more complicated: scaling and rotation.

1. Image scaling

Image scaling is mainly used to change the size of the image, and the width and height of the image will change after scaling. The horizontal scaling factor controls the scaling of the image width. If its value is 1, the width of the image will remain unchanged; the vertical scaling factor controls the scaling of the image height. If its value is 1, the height of the image will remain unchanged. If the horizontal scaling factor and the vertical scaling factor are not equal, the ratio of the width and height of the image will change after scaling, which will distort the image. To keep the ratio of the width and height of the image unchanged, the horizontal and vertical scaling factors need to be equal.

The horizontal scaling factor and vertical scaling factor of the image on the left are both 0.5; the horizontal scaling factor of the image on the right is 1, and the vertical scaling factor is 0.5. After scaling, the width and height ratio of the image changes, and the image is deformed.

1.1 The principle of scaling

Let the horizontal scaling coefficient be sx, the vertical scaling coefficient be sy, (x0, y0) are the coordinates before scaling, (x, y) are the coordinates after scaling, and the scaling coordinate mapping relationship:

The matrix representation is of the form:

This is forward mapping. The size of the image is changed during the scaling process. Using forward mapping will cause problems of overlapping mapping and incomplete mapping. Therefore, what is more concerned about is backward mapping, that is, the output image passes through the backward mapping relationship. Find its corresponding pixel in the original image.

Backward mapping relationship:

1.2 Scaling implementation based on OpenCV

When scaling the image, you first need to calculate the size of the scaled image, set newWidth and newHeight to be the width and height of the scaled image, and width and height to be the width and height of the original image, then:

Then traverse the scaled image, and calculate the position of the scaled pixel in the original image according to the backward mapping relationship. If the floating-point coordinates are obtained, an interpolation algorithm needs to be used to obtain the approximate pixel value.

According to the above formula, the width and height of the scaled image can be multiplied by the original image width and height and the scaling factor.

int rows = static_cast<int>(src.rows * xRatio + 0.5);
int cols = static_cast<int>(src.cols * yRatio + 0.5);

It is possible to get floating point coordinates when mapping backwards, which is handled here using nearest neighbor interpolation and bilinear interpolation.

nearest neighbor interpolation

for (int i = 0; i < rows; i++){
    
    
int row = static_cast<int>(i / xRatio + 0.5);
if (row >= src.rows)
                row--;
            origin = src.ptr<uchar>(row);
            p = dst.ptr<uchar>(i);

for (int j = 0; j < cols; j++){
    
    
int col = static_cast<int>(j / yRatio + 0.5);
if (col >= src.cols)
                    col--;
                p[j] = origin[col];
            }
        }

Nearest neighbor interpolation requires only "rounding" of floating point coordinates. However, when rounding, the result may exceed the boundary of the original image (only 1 larger than the boundary), so it needs to be corrected.

bilinear interpolation

The accuracy of bilinear interpolation is much better than that of nearest neighbor interpolation, and the amount of calculation is much larger. Bilinear interpolation uses the values of four pixels around the floating-point coordinate to mix and approximate the pixel value of the floating-point coordinate according to a certain ratio.

Set the floating-point coordinate F, the four surrounding integer coordinates are T1, T2, T3, T4, and the absolute value of the vertical axis difference between F and its upper left integer coordinate is n, and the absolute value of the horizontal axis difference is n. . According to the analysis of the previous article, the pixel value T of the floating point coordinate F can be obtained by the following formula:

F1 is ([Fy], Fx), and F2 is ([Fy]+1, Fx). For details, see OpenCV2: Geometric transformation of images, translation, mirroring, scaling, and rotation (1) .

When implementing, first calculate the four integer coordinates around it according to the floating point coordinates

double row = i / xRatio;
double col = j / yRatio;

int lRow = static_cast<int>(row);
int nRow = lRow + 1;
int lCol = static_cast<int>(col);
int rCol = lCol + 1;

double u = row - lRow;
double v = col - lCol;

缩放放后图像的坐标(i,j),根绝向后映射关系找到其在原图像中对应的坐标(i / xRatio,j / yRatio)，接着找到改坐标周围的四个整数坐标(lcol,lRow),(lCol,nrow),

(rCol,lRow),(rCo1,nRow)。下面根据双线性插值公式得到浮点坐标的像素值

//坐标在图像的右下角
if ((row >= src.rows - 1) && (col >= src.cols - 1)) {
    
    
                lastRow = src.ptr<Vec3b>(lRow);
                p[j] = lastRow[lCol];
            }
//最后一行
else if (row >= src.rows - 1) {
    
    
                lastRow = src.ptr<Vec3b>(lRow);
                p[j] = v * lastRow[lCol] + (1 - v) * lastRow[rCol];
            }
//最后一列
else if (col >= src.cols - 1){
    
    
                lastRow = src.ptr<Vec3b>(lRow);
                nextRow = src.ptr<Vec3b>(nRow);
                p[j] = u * lastRow[lCol] + (1 - u) * nextRow[lCol];
            }
else {
    
    
                lastRow = src.ptr<Vec3b>(lRow);
                nextRow = src.ptr<Vec3b>(nRow);
                Vec3b f1 = v * lastRow[lCol] + (1 - v) * lastRow[rCol];
                Vec3b f2 = v * nextRow[lCol] + (1 - v) * lastRow[rCol];
                p[j] = u * f1 + (1 - u) * f2;
            }

由于使用四个像素进行计算，在边界的时候，会有不存在的像素，这里把在图像的右下角、最后一行、最后一列三种特殊情形分别处理。

2.图像旋转

2.1旋转原理

图像的旋转就是让图像按照某一点旋转指定的角度。图像旋转后不会变形，但是其垂直对称抽和水平对称轴都会发生改变，旋转后图像的坐标和原图像坐标之间的关系已不能通过简单的加减乘法得到，而需要通过一系列的复杂运算。而且图像在旋转后其宽度和高度都会发生变化，其坐标原点会发生变化。

图像所用的坐标系不是常用的笛卡尔，其左上角是其坐标原点，X轴沿着水平方向向右，Y轴沿着竖直方向向下。而在旋转的过程一般使用旋转中心为坐标原点的笛卡尔坐标系，所以图像旋转的第一步就是坐标系的变换。设旋转中心为(x0,y0)，(x’,y’)是旋转后的坐标，(x,y)是旋转后的坐标，则坐标变换如下：

矩阵表示为：

在最终的实现中，常用到的是有缩放后的图像通过映射关系找到其坐标在原图像中的相应位置，这就需要上述映射的逆变换

坐标系变换到以旋转中心为原点后，接下来就要对图像的坐标进行变换。

上图所示，将坐标(x0,y0)顺时针方向旋转a,得到(x1,y1)。

旋转前有：

旋转a后有：

矩阵的表示形式：

其逆变换：

由于在旋转的时候是以旋转中心为坐标原点的，旋转结束后还需要将坐标原点移到图像左上角，也就是还要进行一次变换。这里需要注意的是，旋转中心的坐标(x0,y0)实在以原图像的左上角为坐标原点的坐标系中得到，而在旋转后由于图像的宽和高发生了变化，也就导致了旋转后图像的坐标原点和旋转前的发生了变换。

上边两图，可以清晰的看到，旋转前后图像的左上角，也就是坐标原点发生了变换。

在求图像旋转后左上角的坐标前，先来看看旋转后图像的宽和高。从上图可以看出，旋转后图像的宽和高与原图像的四个角旋转后的位置有关。

设top为旋转后最高点的纵坐标，down为旋转后最低点的纵坐标，left为旋转后最左边点的横坐标，right为旋转后最右边点的横坐标。

旋转后的宽和高为newWidth,newHeight，则可得到下面的关系：

也就很容易的得出旋转后图像左上角坐标(left,top)（以旋转中心为原点的坐标系）

故在旋转完成后要将坐标系转换为以图像的左上角为坐标原点，可由下面变换关系得到：

矩阵表示：

其逆变换：

综合以上，也就是说原图像的像素坐标要经过三次的坐标变换:

将坐标原点由图像的左上角变换到旋转中心
以旋转中心为原点，图像旋转角度a
旋转结束后，将坐标原点变换到旋转后图像的左上角

可以得到下面的旋转公式：(x’,y’)旋转后的坐标，(x,y)原坐标，(x0,y0)旋转中心,a旋转的角度（顺时针）

这种由输入图像通过映射得到输出图像的坐标，是向前映射。常用的向后映射是其逆运算

2.2基于OpenCV的实现

得到了上述的旋转公式，实现起来就不是很困难了.

首先计算四个角的旋转后坐标（以旋转中心为坐标原点）

const double cosAngle = cos(angle);
const double sinAngle = sin(angle);

//原图像四个角的坐标变为以旋转中心的坐标系
    Point2d leftTop(-center.x, center.y); //(0,0)
    Point2d rightTop(src.cols - center.x,center.y); // (width,0)
    Point2d leftBottom(-center.x, -src.rows + center.y); //(0,height)
    Point2d rightBottom(src.cols - center.x, -src.rows + center.y); // (width,height)

//以center为中心旋转后四个角的坐标
    Point2d transLeftTop, transRightTop, transLeftBottom, transRightBottom;
    transLeftTop = coordinates(leftTop, angle);
    transRightTop = coordinates(rightTop, angle);
    transLeftBottom = coordinates(leftBottom, angle);
    transRightBottom = coordinates(rightBottom, angle);

需要注意的是要将原图像四个角的坐标变为以旋转中心为坐标原点的坐标系坐标。然后通过旋转变换公式

得到旋转后四个角的坐标。

由于旋转角度的不同旋转后四个角的位置和其在原图像的位置是不相同的，也就是说原图像的左上角在旋转后不一定是旋转后图像的左上角，有可能是右下角。所以在计算旋转后图像的宽度就不能使用原图右上角旋转后的横坐标减去原图像左下角旋转后的横坐标，高度也是如此。（在查找资料时发现，大部分都是使用这种方式计算的图像的宽度和高度）。

//计算旋转后图像的width，height
double left = min({ transLeftTop.x, transRightTop.x, transLeftBottom.x, transRightBottom.x });
double right = max({ transLeftTop.x, transRightTop.x, transLeftBottom.x, transRightBottom.x });
double top = max({ transLeftTop.y, transRightTop.y, transLeftBottom.y, transRightBottom.y });
double down = min({ transLeftTop.y, transRightTop.y, transLeftBottom.y, transRightBottom.y });

int width = static_cast<int>(abs(left - right) + 0.5);
int height = static_cast<int>(abs(top - down) + 0.5);

计算旋转图像的宽度，可以使用四个角旋转后最右边点的横坐标减去最左边点的横坐标；高度时最上边点的纵坐标减去最下边点的纵坐标。

然后，就可以使用最终的那个旋转公式，处理图像的每一个像素了。

const double num1 = -abs(left) * cosAngle - abs(top) * sinAngle + center.x;
const double num2 = abs(left) * sinAngle - abs(top) * cosAngle + center.y;

    Vec3b *p;
for (int i = 0; i < height; i++)
    {
    
    
        p = dst.ptr<Vec3b>(i);
for (int j = 0; j < width; j++)
        {
    
    
//坐标变换
int x = static_cast<int>(j  * cosAngle + i * sinAngle + num1 + 0.5 );
int y = static_cast<int>(-j * sinAngle + i * cosAngle + num2 + 0.5 );

if (x >= 0 && y >= 0 && x < src.cols && y < src.rows)
                p[j] = src.ptr<Vec3b>(y)[x];
        }
    }

这使用的插值方法是最邻近插值，双线性插值的实现方法和图像缩放类似，不再赘述。

使用上述算法进行图像旋转，会发现不论使用图像内的那个位置作为旋转的中心，最后得到的结果都是一样的。这是因为，不同位置作为旋转中心，旋转后图像的大小都是一样，所不同的只是其位置。而在最后的一次变换中（图像旋转用了三次坐标变换），统一的把坐标原点移到了旋转后图像的左上角。这就相当于对图像做了一次平移，把其位置也挪到了一起，最后旋转得到的图像也就一样了。

如果，在旋转结束后把坐标原点不是移到旋转后图像的左上角，而是原图像的左上角，会是怎么一个情形呢？

就像上图，图像的部分区域会被截掉。当然，这时旋转中心不同的话最终得到的图像也就不同了，被截掉的部分不相同。

3.组合变换

Combinatorial transformations are all about putting multiple geometries together. The transformation formula for rotation is deduced above, so the combined transformation is not very difficult, it is nothing more than multiplying a few more matrices. The more common combined transformation: scaling + rotation + translation, the following is an example to deduce the formula for the combined transformation.

Scaling
Let (x0, y0) be the coordinates after scaling, (x, y) are the coordinates before scaling, and sx, sy are scaling factors
Translation
Let (x0, y0) be the coordinates after translation, (x, y) are the coordinates before translation, and dx, dy are the offsets
Rotation
Let (x0, y0) be the coordinates after rotation, (x, y) are the coordinates before rotation, (m, n) is the center of rotation, a is the angle of rotation, and (left, top) are the coordinates of the upper left corner of the image after rotation

Three transformation matrices are obtained, which are combined in the order of scaling, translation, and rotation

When combining transformations, pay attention to the order, after all, the left multiplication and right multiplication of the matrix are different.

4. Finally

The longest article I've written so far. It has been almost a week since I put it together with the previous article. Except for a few setbacks in the implementation of the algorithm, the rest of the transformations are very smooth. The time-consuming is mainly drawing and mathematical formulas. I have not found a suitable tool for drawing, and it is nothing. Mathematical formulas deliberately spent a day learning Latex, and found some tools to convert Tex to HTML, but the effect is not good. Very good, or paste the picture. Without pictures, some things are really difficult to say clearly only by text, so I have to learn to learn MATLAB drawing.