Visualization of Unity perspective projection matrix transformation

foreword

The demo of this article Demohas been uploaded Github: CameraProjectionMatix

In the 3D rendering pipeline, a point of an object is mapped from a three-dimensional space to a two-dimensional screen, usually using a MVPtransformation matrix, and these three letters refer to three matrices transformed from different coordinate spaces, namely:

  • M( Model): Transform from local space to world space
  • V ( View): Convert world space to camera space
  • P ( Projection): Convert camera space to regular viewing volume

In the previous article, there is a specific description of the transformation matrix of different coordinate spaces of the camera. The introduction of the transformation from the local coordinates of the object to the world coordinates is also applicable to the transformation from the world space to the camera space. If you are interested, you can check this Link:

On the basis of the previous article, this article will introduce the most important P transformation in the rendering process, that is, the process of converting the camera space to the regular observation body (it can also be understood as converting the perspective space of the camera to the orthogonal space), the process As shown in the picture (the picture comes from the Internet):

Camera space frustum to CVV

There are many theoretically detailed descriptions about the derivation of the projection matrix, but it is usually a graphics-oriented formula analysis, which is difficult to understand and rarely mentioned in the application of engineering projects. In order to easily understand the transformation process of projection and apply it in practice, this article will Unitymake a detailed transformation visualization based on the engine, and disassemble the whole process as simply as possible

The meaning of the camera projection matrix

1. The concept of camera perspective projection:

In painting theory, perspective is used to represent the method or technique of describing the spatial relationship of objects on a plane or curved surface. Generally speaking, the sense of space and three-dimensionality on the plane will be represented by three attributes:

  • The perspective shape (contour line) of the object, that is, the reason for the change and shrinkage of the shape at different distances up, down, left, right, front and back
  • The color change caused by distance, that is, the scientificization of color perspective and air perspective
  • The degree of blurring of objects at different distances, that is, invisible perspective

The above information comes from Baidu Encyclopedia's interpretation of perspective. A simple understanding is that, in the perspective shape of objects, due to the existence of eye opening angles, we always have the feeling that the objects are near, large and far small. In the long run, due to the gradual accumulation of experience, this phenomenon indirectly helps humans complete The structure of the three-dimensional space

Back to the game engine, there is nothing better than simulating the imaging of the human eye. In order to allow the computer to correctly simulate the human senses to render the perspective picture, the camera uses a cone with an opening angle to mark its viewfinder range in perspective mode. , and usually the tip of the top of the cone will be eliminated, called the cone

Although using the concept of the frustum can solve the problem of perspective very well, it brings certain problems to its subsequent calculation. To put it simply, a rectangle can easily divide its spatial range through the center point and the length of each side, and at the same time compress a certain axis to achieve the purpose of three-dimensional to two-dimensional projection. But the viewing cone is a cone with different lengths in different axial directions, so it is difficult to define the range

Since it is difficult to directly determine the space of the frustum, an excellent programmer or mathematician thinks of a mathematical transformation formula that regularizes the frustum and participates in the calculation in the form of a matrix. This is the perspective projection matrix of the camera

2. Visualize the camera frustum space through Gizmos:

Different from the orthographic projection mode of the camera, in order to obtain a near-large and far-small picture, the perspective projection will scale the picture of the plane section according to the depth information, and these plane sections are combined continuously to form the camera's viewing cone space, which can be provided by Unityproviding The drawing tool Gizmosto represent the rendering space range of the camera, the drawing code is:

	public void OnDrawGizmos()
    {
    
    
        //相机投影矩阵绘制
        Matrix4x4 start = Gizmos.matrix;
        Gizmos.matrix = Matrix4x4.TRS(transform.position, transform.rotation, Vector3.one);
        Gizmos.color = Color.yellow;
        Gizmos.DrawFrustum(Vector3.zero, cam.fieldOfView, cam.farClipPlane, 0, cam.aspect);
        Gizmos.color = Color.red;
        Gizmos.DrawFrustum(Vector3.zero, cam.fieldOfView, cam.farClipPlane, cam.nearClipPlane, cam.aspect);      
        Gizmos.matrix = start;

        //坐标辅助线绘制
        Gizmos.color = Color.red;
        Gizmos.DrawLine(cam.transform.position, cam.transform.position + cam.transform.right * 10);
        Gizmos.color = Color.green;
        Gizmos.DrawLine(cam.transform.position, cam.transform.position + cam.transform.up * 10);
        Gizmos.color = Color.blue;
        Gizmos.DrawLine(cam.transform.position, cam.transform.position + cam.transform.forward * 10);
    }

After editing the above code, you can see the auxiliary lines in the figure below in Unitythe engine window, where the three-dimensional figure delineated by the red frame is the camera projection frustum:Scene

insert image description here

Projection Matrix Transformation Process

The rendering space of the camera can be delineated through the frustum, but how to express it mathematically for machine understanding and calculation? In order to draw conclusions, the conditions need to be clarified first. Before starting to push, you need to get the information conditions you have. Check Unitythe official documents to get the parameter information related to the camera angle of view:

  • Near Clip Planes: near clipping plane, representing the nearest position where the object will be rendered
  • Far Clip Planes: The far clipping plane, representing the farthest position of the object to be rendered
  • Camera FOV: The camera opening angle represents the aspect ratio of the field of view
  • Screen aspect ratio: According to this ratio, the aspect ratio of the camera in the other direction can be obtainedFOV

1. Identify the eight vertices of the projection frustum:

To identify the scope of a space, line segments are usually used to form a wireframe of the corresponding shape, for example, the grid of the model is used. To determine the length and position of a line segment, you need to get the vertex first. So our first step is to calculate the eight vertices corresponding to the camera's view frustum based on some basic parameters of the above camera, so as to lay the foundation for subsequent CVVspace delineation and acquisition.

In order to simplify the derivation process, the three-dimensional space problem is mapped to two-dimensional to consider. Taking the Y axis as the normal line, the shape of the camera’s viewing cone is shown in the figure below. The X-axis and Z-axis form the two-dimensional perspective of the camera’s viewing cone space range marking line, and mark the pronouns of some key points. According to the previous camera Key parameters, the following known information can be obtained:

  • The length of AB: is the near clipping plane of the cameranearClipPlane
  • The length of AC: is the far clipping plane of the camerafarClipPlane
  • The angle of the angle BAD: half of the camera FOV( horiziontalor verticaldirection) corresponds to radians

insert image description here

Taking the coordinate acquisition of point D as an example, through the triangle constructed in the above figure, it is known that the length of AB is the distance from the near clipping plane to the camera, and the degree of angle BAD is half of that of the camera, and the angle of BD can be calculated by using trigonometric functions FOV. Length, so that the coordinate lengths AB and BD of point D on the Z-axis and X-axis are obtained

As for the length of the Z axis, the corresponding camera can also use the length of the near clipping plane and the value of the camera in another axis direction FOV(which can be obtained using Cameraa static method )VerticalToHorizontalFieldOfView

Loop through the above evaluation process to get the coordinates of the eight vertices of the camera's view frustum, and place an object at the corresponding position to instantiate these coordinate points. The code is as follows:

 	List<Vector3> GetPosLocation()
    {
    
    
        List<Vector3> backList=new List<Vector3>();
        for (int z = 0; z < 2; z++)
        {
    
    
            for (int i = -1; i < 2; i += 2)
            {
    
    
                for (int j = -1; j < 2; j += 2)
                {
    
    
                    Vector3 pos = GetPos(z, new Vector2Int(i, j) , cam);
                    backList.Add(pos);
                }
            }
        }
      return backList;
    }
    Vector3 GetPos(int lenType,Vector2Int dirType,Camera cam)
    {
    
    
        Vector3 cPos = cam.transform.position;
        //根据FOV得到水平与垂直角度一般对应的弧度
        float vecAngle = (cam.fieldOfView*Mathf.PI)/360;
        float horAngle = Camera.VerticalToHorizontalFieldOfView(cam.fieldOfView, cam.aspect) * Mathf.PI/360;
        
        float zoffset  = lenType == 0 ? cam.nearClipPlane : cam.farClipPlane;
        float vecOffset = zoffset * Mathf.Tan(vecAngle);
        float horOffset = zoffset * Mathf.Tan(horAngle);
        Debug.Log(Mathf.Tan(horAngle));
        Vector3 offsetV3 = new Vector3(horOffset * dirType.x, vecOffset * dirType.y, zoffset);
        return cPos + offsetV3;
    }

After obtaining the coordinate positions of the eight vertices through trigonometric functions, and instantiate them Sphereto calibrate these points, the specific effect is shown in the figure:

insert image description here

2. Convert the camera space to CVV through the projection matrix:

In order to calibrate the scope of the viewing cone transformed by the projection matrix, use the eight vertices of the camera viewing cone obtained above as a reference to calculate it. MVPSince this article directly performs space conversion based on the points of the world coordinates, there is no concept of local coordinates. So the process of transforming objects from local space to world space can be ignored

Skip Mthe calculation, and directly calculate the coordinates of the above eight points through the trigonometric function and the camera's space transformation coordinates to obtain their coordinates in the camera space, as shown in the figure:
insert image description here

XAs can be seen from the figure above, when the camera is at the origin, the calculated eight vertices are the same as the eight vertex coordinates before calculation Y, but Zthe axes are reversed. This is because Unityboth the world space and the local space coordinates use the left-handed coordinate system, while the camera space uses the opposite right-handed coordinate system

After completing the conversion of the vertices from the world space to the camera space, you can use the camera's projection matrix line of sight to convert from the camera space to the camera space, CVVbut pay attention to the calculation of the projection matrix and the calculation in the homogeneous coordinates, so when you get the CVVcoordinates of the space It needs to be divided by the components in the return value of Vector4 . After the calculation is completed, it will be displayed as follows W:
insert image description hereIn the body, of course, the length in the three-dimensional space is not obvious, and it is mapped to the two-dimensional space, and its length is shown in the figure:

insert image description here
After the internal corresponding coordinates are obtained through projection transformation CVV, the three-dimensional space can be reduced to a two-dimensional plane only by discarding a certain axis, and the viewfinder picture of the camera can be obtained. At the same time, in order to obtain the picture with the correct occlusion relationship, the drawing process of the picture is usually done based on the Z axis, that is, the concept of depth buffer

Camera projection matrix:

Through the previous visualization process, we can see that converting the viewing cone of the camera space to CVVthe meaning of the camera projection matrix. Conversely, the projection matrix supporting this process is a collection of mathematical formulas in this transformation process

According to the Unity documentation, the camera’s projection matrix can be obtained, but it should be noted that the camera parameters are discarded in the matrix FOV, and converted into the distance from the center point of the camera’s near clipping plane to the four sides, so that the expression of the matrix is ​​a little clearer, and the matrix is ​​expressed as:

{ 2 ∗ n e a r / ( r i g h t − l e f t ) 0 ( r i g h t + l e f t ) / ( r i g h t − l e f t ) 0 0 2 ∗ n e a r / ( t o p − b o t t o m ) ( t o p + b o t t o m ) / ( t o p − b o t t o m ) 0 0 0 − ( f a r + n e a r ) / ( f a r − n e a r ) − ( 2 ∗ f a r ∗ n e a r ) / ( f a r − n e a r ) 0 0 − 1 0 } \left\{ \begin{matrix} 2*near/(right-left) & 0 & (right+left)/(right-left) & 0\\ 0 & 2*near/(top-bottom) & (top + bottom) / (top - bottom) & 0\\ 0 & 0 & -(far + near) / (far - near) & -(2 * far * near) / (far - near) \\ 0 &0 &-1 &0 \end{matrix} \right\} 2near/(rightleft)00002near/(topbottom)00(right+left)/(rightleft)(top+bottom)/(topbottom)(far+near)/(farnear)100(2farnear)/(farnear)0

The meaning of each parameter:

  • near: The distance from the camera near clipping plane to the camera
  • far: The distance from the far clipping plane of the camera to the camera
  • right: the distance from the right border of the near clipping plane of the camera to the midpoint
  • left: The distance from the left border of the near clipping plane of the camera to the midpoint
  • top: The distance from the border on the near clipping plane of the camera to the midpoint
  • bottom: The distance from the bottom border of the near clipping plane of the camera to the midpoint

Although the principle is relatively easy to understand in terms of its specific derivation process, the expression of the mathematical formula is more complicated. If you are very interested, you can read this article:

Explore perspective projection transformations in depth

Projection matrix extension

1. Use the camera projection matrix to determine whether a point is within the camera's field of view

Due to the irregular range of the camera's view frustum, it is more complicated to directly use the world coordinates to determine whether a point is within the camera's field of view. In the previous understanding of the projection matrix, when the viewing cone of the camera is converted from world coordinates CVV, its boundary range is a regular cube of known size, and it is very easy to determine whether a certain point exists within the space range:

 public static bool CheckPointIsInCamera(Vector3 worldPoint, Camera camera)
    {
    
    
        Vector4 projectionPos = camera.projectionMatrix * camera.worldToCameraMatrix * new Vector4(worldPoint.x, worldPoint.y, worldPoint.z, 1);
        if (projectionPos.x < -projectionPos.w) return false;
        if (projectionPos.x > projectionPos.w) return false;
        if (projectionPos.y < -projectionPos.w) return false;
        if (projectionPos.y > projectionPos.w) return false;
        if (projectionPos.z < -projectionPos.w) return false;
        if (projectionPos.z > projectionPos.w) return false;
        return true;
    }

However, usually there are relatively few judgment scenes for a point, and more is to deal with an object in the scene. At the same time, in order to reduce the amount of calculation as much as possible, the boundary range of the object will be used to judge. Here is a method written by a big guy to judge Boundwhether the object has an intersection with the camera cone:

    public static bool CheckBoundIsInCamera(this Bounds bound, Camera camera)
    {
    
    
        System.Func<Vector4, int> ComputeOutCode = (projectionPos) =>
        {
    
    
            int _code = 0;
            if (projectionPos.x < -projectionPos.w) _code |= 1;
            if (projectionPos.x > projectionPos.w) _code |= 2;
            if (projectionPos.y < -projectionPos.w) _code |= 4;
            if (projectionPos.y > projectionPos.w) _code |= 8;
            if (projectionPos.z < -projectionPos.w) _code |= 16;
            if (projectionPos.z > projectionPos.w) _code |= 32;
            return _code;
        };

        Vector4 worldPos = Vector4.one;
        int code = 63;
        for (int i = -1; i <= 1; i += 2)
        {
    
    
            for (int j = -1; j <= 1; j += 2)
            {
    
    
                for (int k = -1; k <= 1; k += 2)
                {
    
    
                    worldPos.x = bound.center.x + i * bound.extents.x;
                    worldPos.y = bound.center.y + j * bound.extents.y;
                    worldPos.z = bound.center.z + k * bound.extents.z;

                    code &= ComputeOutCode(camera.projectionMatrix * camera.worldToCameraMatrix * worldPos);
                }
            }
        }
        return code == 0 ? true : false;
    }

2. Little knowledge of depth buffer

The depth buffer is used to record the depth value of each pixel. Through the depth buffer, a depth test can be performed to determine the occlusion relationship of pixels and ensure correct rendering. When the object in the space is calculated into the CVV space by MVP, the distance from the camera will be recorded in the direction of the Z axis, which is the depth in pixels

Due to the relationship of space projection, the accuracy of the depth buffer is non-linear. Usually, the closer the distance to the camera, the higher the accuracy of the depth buffer. As you can see from the example in the figure below, a point in the world space (the yellow ball on the right Logo) when moving away from the camera at a constant speed, the Z-axis change of the corresponding CVV space will become smaller and smaller:
insert image description here

Guess you like

Origin blog.csdn.net/xinzhilinger/article/details/123812387