Comparison of PnP algorithms for 2D-3D matching problems: DLT, P3P, EPnP

1. Problem definition

First of all, you need to know what is PnP (Perspective-n-Point)? What problem is it to solve?
Known information:

  • The coordinates of n 3D points in the A coordinate system (which can be considered as the world coordinate system) { p 1 , p 2 , . . . , pn } \{p_1, p_2, ..., p_n\}{ p1,p2,...,pn} , and the coordinates of these 2D points projected on the image in the image coordinate system{ u 1 , u 2 , . . . , un } \{u_1, u_2, ..., u_n\}{ u1,u2,...,un}

  • The matching relationship between the n 3D reference points and the 2D projection points on the image (3D positions are usually determined by triangulation or RGBD depth maps, for binocular or RGBD odometers, PnP can be directly used to estimate camera motion, and monocular Visual odometry needs to be initialized first)

  • Camera internal reference K, distortion coefficient

variable to be requested:

  • A coordinate system (it can be the world coordinate system, or another camera coordinate system) to the pose transformation of the camera coordinate system , that is, rotation and translation

insert image description here

The illustration is from the official website of opencv . In the camera coordinate system here, the x-axis points up, the y-axis points down, and the z-axis points forward.

Why use 3D-2D method for pose estimation?
This method does not require epipolar constraints, and can obtain better motion estimation in few matching points.

Optional methods and prerequisites:
There are many solutions to this type of problem, such as P3P, direct linear transformation (DLT), EPnP, UPnP, etc., and non-linear optimization can also be used to construct least squares problems for iterative solutions, that is, beam Adjustment method (BA).

options DLT P3P EPnP Nonlinear Optimization (BA)
求解思路 Construct an augmented matrix (R|t), and construct a 12-dimensional equation system through the projection matrix Simultaneous equations via similar triangles and the law of cosines via three point pairs Transform by control point, the complexity is O(n) Both the camera pose and the spatial point position are regarded as optimization variables to minimize the reprojection error
概况 At least 6 point pairs are required (12 unknowns, each point pair constructs two equations), and when there are more than 6, you can use SVD, etc. to find the least squares solution to the overdetermined equation 4 point pairs are required, 3 are used to solve, one point pair is used to verify, and the only solution is returned It is better for noisy feature points; it is a closed-form solution that does not require iterations and initial estimates
不足 Ignoring the connection between T matrices, the calculated R does not necessarily satisfy SO(3), and it is necessary to find a rotation matrix to approximate it It is difficult to use more information on matching point pairs, and it will fail when encountering noise or mismatching. 4 point pairs are required, and if these points are coplanar may fall into a local minimum

2. Direct Linear Transformation DLT

The homogeneous coordinates of a 3D point in the world coordinate system are known, the homogeneous coordinates of the corresponding 2D projection point, and the internal parameters of the camera (if you don’t know, you can use PnP to estimate K, R, t, and the result is slightly worse if there are more unknowns. point), then you can write the projection from 3D point to 2D point:
λ ( uv 1 ) = ( fx 0 cx 0 fycy 0 0 1 ) ( R ∣ t ) ( XYZ 1 ) \lambda \left( \begin{array }{ccc} u \\ v \\ 1 \end{array}\right) = \left( \begin{array}{ccc} f_x & 0 & c_x \\ 0 & f_y & c_y \\ 0& 0& 1 \end {array}\right) \left( \begin{array}{ccc}R \ | \ t \end{array}\right) \left( \begin{array}{ccc} X \\ Y \\ Z\\ 1 \end{array}\right)luv1=fx000fy0cxcy1(R  t)XYZ1
In this method, the orthogonal constraint of the rotation matrix itself is ignored, R and t are equivalent to 12 unknowns, ( R ∣ t ) (R | t)( R t ) is a 3*4 matrix.
At this time, you can do it yourself, expand this formula completely, and then eliminateλ \lambdaλ λ , in the following case:
( X fx Y fx Z fxfx 0 0 0 0 X cx − ux Y cx − u YZ cx − u Z cx − u 0 0 0 0 X fy Y fy Z fyfy X cy − vx Y cy − v YZ cy − v Z cy − v ) ( a 1 a 2 . . . a 12 ) = 0 \left( \begin{array}{ccc} Xf_x & Yf_x & Zf_x & f_x & 0 & 0& 0 &0 &Xc_x- ux & Yc_x-uY & Zc_x - uZ & c_x - u \\ 0 & 0 & 0 &0 & Xf_y & Yf_y & Zf_y & f_y & Xc_y-vx & Yc_y-vY & Zc_y - vZ & c_y - v \end{array}\ right ) \left( \begin { array } {ccc} a_1 \\ a_2 \\ .\\.\\.\\ a_{12} \end{array}\right) = 0(Xfx0Yfx0Zfx0fx00Xfy0Yfy0Zfy0fyXcxuxXcyvxYcxuYYcyv YZcxand ZZcyvZcxucyv)a1a2...a12=0
can see that a point pair can provide 2 equations, then 12 unknowns need at least 6 point pairs.

When there are more than 6 point pairs, there will be overdetermined, then use SVD to solve: read related

There is a problem here. When the unknowns are assumed, there is no correlation between the 12 unknowns, and the obtained rotation matrix may not meet the requirements of positive definiteness. So at this time, we need to find the best rotation matrix to approximate it. Here we can use QR decomposition, which is equivalent to reprojecting the result from the matrix space onto the SE(3) manifold, converting it into two parts of rotation and translation.
R ← ( RRT ) − 1 2 RR \leftarrow (RR^T)^{-\frac{1}{2}} RR(RRT)21R

References:

3. P3P

insert image description here
Known information:

  • The coordinates of the three points ABC in the world coordinate system can be used to find the lengths of AB, BC, and AC

  • ∠ BOA ( α ) , ∠ AOC ( β ) , ∠ AOB ( γ ) \angle{BOA}(\alpha), \angle{AOC(\beta)},\angle{AOB}(\gamma)B O A ( a ) ,A O C ( b ) ,∠A O B ( γ ) : These three angles are known quantities. I personally understand it this way. First of all, you can’t directly use OA, OB, OC plus the law of cosines to find it, because you don’t know the coordinates of the optical center O in the world . position under the system. In the case where the world coordinates of the 3D point and the 2D point are known to match, the above picture can be drawn. It can be found that the 2D point and the point in the corresponding camera coordinate system can also form the same angle. In this way, the internal reference and pixel coordinates are known. You can find the included angle.

    insert image description hereAt this point, the pixel coordinates can be converted into normalized camera coordinate system coordinates first , such as ( xz , yz , 1 ) = ( ufx − cxfx , vfy − cyfy , 1 ) (\frac{x}{z}, \frac{y}{z},1)=(\frac{u}{f_x}-\frac{c_x}{f_x},\frac{v}{f_y}-\frac{c_y}{f_y}, 1 )(zx,zy,1)=(fxufxcx,fyvfycy,1 ) , then we can find the modulus length from the optical center O to this normalized coordinate point, and then classify the vector length as a unit length, that is, letk X 1 s , k X 2 s , k X 3 s ^ kX_1^s,^kX_2^s,^kX_3^skX1s,kX2s,kX3sBoth are unit vectors pointing from the optical center to the camera coordinate point . With the unit vector, you can directly do the inner product to find the cosine value, and you can get three angles.

The ultimate goal:
through the law of cosines, we can get the lengths of OA, OB, and OC, and we can also get the coordinates of the three points ABC in the camera coordinate system . Afterwards, it can be transformed into a 3D (camera system)-3D (world system) point-to-point problem to calculate the pose R,t of the camera in the world coordinate system. Proceed as follows:

From the above known conditions (green), what we want to calculate is s1, s2, s3 in the figure. insert image description here
Starting from the first formula, define u = s 2 s 1 , v = s 3 s 1 u=\frac{s_2}{s_1}, v=\frac{s_3}{s_1}u=s1s2,v=s1s3并代入:
a 2 = s 1 2 ( u 2 + v 2 − 2 u v cos ⁡ α ) a^2 = s_1^2(u^2 + v^2 - 2uv\cos\alpha) a2=s12(u2+v22uvcosα ) , so we can gets 1 2 = a 2 u 2 + v 2 − 2 uv cos ⁡ α s_1^2=\frac{a^2}{u^2+v^2-2uv\cos\alpha}s12=u2+v22uvcosaa2.Similarly
, substituting u and v into the second and third formulas can get s 1 2 s_1^2s12After substituting the other two representations of , a quartic polynomial is obtained:
insert image description here
find v to get s 1 , s 2 , s 3 s_1,s_2,s_3s1,s2,s3, there is a problem here, that is, there will be 4 sets of solutions, so additional measures are needed to eliminate this ambiguity:

  • Use known approximate solutions from GPS
  • Verification using the fourth point pair yields a unique solution

Give an example of four sets of solutions: the three included angles are all equal, and the distances between the 3D points are also equal.
insert image description here
At this time, the coordinates of the three points ABC in the camera and the world coordinate system are known, and the 3D-3D can be solved based on this ICP problem, so as to solve the pose of the camera.

4. EPnP

Corresponding paper "EPnP: Accurate Non-Iterative O(n) Solution to the PnP Problem"

Convert the 2D image point to the 3D point of the camera coordinate system through the internal reference, and then use ICP to solve the 3D-3D transformation to obtain the pose. The core is: solve the coordinates of the 3D reference point in the camera coordinate system

The solution is: use 4 control points as the reference datum, so that all 3D points in the world coordinate system can be expressed as a linear combination of these 4 reference points, and the weight value is 1 . In this way, the 4 control points in the original world coordinate system (or camera coordinate system) can be used to represent all 3D points in the world coordinate system (or camera coordinate system). The core becomes: solve the coordinates of the 4 control points in the camera coordinate system .

  • Selection of control points
    The original control points can be randomly selected. In order to have a stable result, the paper recommends the following fixed selection method: one of the control points is the center of mass of all points in the world coordinate system, and then the center of gravity is subtracted from the coordinates of all 3D points Coordinates, make ATAA^TAAIn the form of T A, use SVD to find the three main directions, and add them to the coordinates of the center of gravity to get the other three control points.
    The coordinates of the 4 control points inthe world coordinate systemarecjw , j = 1 , . . . , 4 c_j^w, j=1,...,4cjw,j=1,...,4. These four points can be obtained directly from 3D points. Let the coordinates of these 4 control points inthe camera coordinate systembecjc , j = 1 , . . . , 4 c_j^c, j=1,...,4cjc,j=1,...,4. Note that for different 3D points, the coordinate weight value corresponding to each point is different.

  • The weight coefficient relationship between the camera coordinate system corresponding to the same 3D point and the 4 control points in the world coordinate system
    Weight α ij \alpha_{ij}aij是相同的,证明过程如下:
    p i c = R c w P i w + t = R c w ∑ j = 1 4 α i j c j w + ∑ j = 1 4 α i j t = ∑ j = 1 4 α i j ( R c w c i w + t ) = ∑ j = 1 4 α i j c j c \mathbf{p}_i^c = R_{cw}P_i^w + t =R_{cw}\sum^4_{j=1} \alpha_{ij} \mathbf{c}_j^w + \sum_{j=1}^4 \alpha_{ij}t = \sum^4_{j=1} \alpha_{ij}(R_{cw}c_i^w + t) = \sum^4_{j=1} \alpha_{ij} \mathbf{c}_j^c pic=RcwPiw+t=Rcwj=14aijcjw+j=14aijt=j=14aij(Rcwciw+t)=j=14aijcjc

In order to find the coordinates of the control points in the camera coordinate system, the following equation can be written according to the camera model:
∀ i , wi ( uivi 1 ) = K pic = ( fx 0 cx 0 fycy 0 0 1 ) ∑ j = 1 4 α ij ( xjcyjczjc ) \forall i, w_i\begin{pmatrix} u_i \\ v_i \\ 1 \end{pmatrix} = K \mathbf{p}^c_i = \begin{pmatrix} f_x & 0 & c_x \\ 0 & f_y & c_y \\ 0 & 0&1\end{pmatrix} \sum_{j=1}^4 \alpha_{ij} \begin{pmatrix} x_j^c \\ y_j^c \\ z_j^c \end{pmatrix}i,wiuivi1=Kpic=fx000fy0cxcy1j=14aijxjcyjczjc

KK hereK is the internal reference matrix, and the camera system coordinates of each control point arecjc = [ xjc , yjc , zjc ] T , j = 1 , . . . 4 c_j^c = [x_j^c, y_j^c, z_j^c] ^T, j = 1,...4cjc=[xjc,yjc,zjc]T,j=1,. . 4 . _ Thiswi w_iwiIt is the depth, and wi = ∑ j = 1 4 α ijzjc w_i = \sum_{j=1}^4 \alpha_{ij}z_j^cwi=j=14aijzjc. Subtract the last line from the first two lines to get:
∑ j = 1 4 α ijfxxjc + α ij ( cx − ui ) zjc = 0 ∑ j = 1 4 α ijfyyjc + α ij ( cy − vi ) zjc = 0 \ begin{array}{l} \sum_{j=1}^4 \alpha_{ij} f_x x_j^c + \alpha_{ij}(c_x - u_i)z_j^c = 0 \\ \sum_{j=1} ^4 \alpha_{ij} f_y y_j^c + \alpha_{ij}(c_y - v_i)z_j^c = 0 \end{array}j=14aijfxxjc+aij(cxui)zjc=0j=14aijfyyjc+aij(cyvi)zjc=0

In this equation system, there are camera coordinates of four control points, that is, there are 12 unknowns, and the unknown x \mathbf{x} can be extractedx becomes a column vector and is written in the following matrix form. MMThe dimension of M is2 n ∗ 12 2n * 122 n12 n n n is the number of pairs.
M x = 0 M\mathbf{x} = 0Mx=0
For the specific solution, please refer to the second reference link of epnp below.

At this time, the weight coefficient is known, the camera system coordinates of the control points are also known, and the position of the 3D point in the camera coordinate system can be recovered. The world coordinates of the 3D point are known from the beginning, so the pose can be solved using ICP up.

Why is the complexity of the algorithm O ( n ) O(n)O ( n ) ?
Reflected in solvingM x = 0 M\mathbf{x}=0Mx=0 , findx \mathbf{x}For x , the dimension must first be12 ∗ 12 12*12121 2 MTMM^TMMT M, the complexity here isO ( n ) O(n)O ( n )


Reference:
DLT: https://zhuanlan.zhihu.com/p/58648937
P3P: https://zhuanlan.zhihu.com/p/140077137
Bonn University P3P courseware: https://www.ipb.uni-bonn.de /html/teaching/msr2-2020/sse2-15-p3p.pdf
EpnP: https://blog.csdn.net/zkk9527/article/details/107939991
https://zhuanlan.zhihu.com/p/59070440

Guess you like

Origin blog.csdn.net/catpico/article/details/121911865