[Pit pit recording] The coordinate system definition of the camera pose in colmap and the implicit conversion of the visualization results

  This problem comes from the fact that I wanted to use colmap's sparse reconstruction results, and then found that the definition of the camera coordinate system was not clear, which caused problems with the results I obtained.
  

1 question elicited

  Let's start from the perspective that we don't know the definition of the coordinate system to understand, and lead to the problem. A UAV image data is used, with a total of 59 images:
insert image description here
  the image below is the sparse reconstruction result output by colmap, there is no problem, and it is consistent with the actual flight situation: the image below is the
insert image description here
  images.txt file in the reconstruction result exported in txt format. According to the format description, the third last data of the IMAGE_ID line is the Z coordinate of the camera. For example, image 1 is 0.365289, and image 2 is -0.0520487.
insert image description here
  Now, we export the X, Y, and Z of all images and visualize them in the cloudcompare software, as shown in the figure below. It can be seen that the distribution of these cameras is different from that in colmap, and the distribution here has a sense of high and low. Obviously, this is not true, and it is impossible for drones to fly like this when collecting data.
insert image description here
  This is very strange. This data is obviously directly exported from colmap. I haven't made any changes to it. How can there be a different display? This is also weird because if you go to colmap and double-click a camera in the reconstruction result, you can see the information of the image, such as image 1, as shown in the figure below, its tx, ty, tz and images file are exactly the same. Then why the same data, the display in colmap seems to be the correct result according to common sense, but it is not the same in other software (but it cannot be because of the problem of cloudcompare software)?
insert image description here
  So, I decided to modify the Z coordinate values ​​of all the cameras in the images file, and I set them all to 5.0. In this case, it stands to reason that the visualization results of these cameras must be on one plane. But this is not the case. The display results in colmap are shown in the following figure:
insert image description here
  It is very strange, the result and the correct result in cloudcompare seem to be arranged very similarly. Why does this happen? This problem has actually troubled me for a long time, and I thought it was the arrangement of the camera or something. But in fact, they are not arranged randomly and randomly, but there seems to be some kind of transformation relationship. So, I transformed it (explained in the next section), and the pose result obtained again was visualized in cloudcompare, as shown in the figure below, and this time it was finally correct.
insert image description here
  

2 Coordinate system definition and conversion

  First, let me introduce the world coordinate system and the camera coordinate system. There is a transformation relationship between them (they are reciprocal transformations). The world coordinate system is to specify a world origin, and the coordinates of all cameras are defined relative to this origin. The camera coordinate system takes each camera itself as the origin of the coordinate system ([0,0,0] in three-dimensional space). Now suppose a point X in the space, its coordinate in a certain camera coordinate system is X camera , and its coordinate in the world coordinate system is X world . Suppose it converts from the camera coordinate system to the world coordinate system through R and t:

Xworld = RXcamera + t

  Then, from the above formula, we can deduce the conversion relationship from the world coordinate system to the camera coordinate system (RT = R -1 , so usually just use RT directly ):

Xcamera = RTXworld - RTt

  However, in colmap, its coordinate system definition is the opposite of what was said above:

Xcamera = RXworld + t

  So correspondingly, the formula to transform it into the world coordinate system is:

Xworld = RTXcamera - RTt

  That is to say, the quaternion Q and translation vector T in the images file output by colmap are R and t in the camera coordinate system defined by it. However, if we want to put these cameras together for visualization, then we need to transform them into the world coordinate system first (so that they can be unified), namely:

R’ = RT
t’ = -RTt

  In fact, when colmap itself is visualized, such a transformation process is already implied, but if you click on the visualized camera, the values ​​you see are actually Q and t in the camera coordinate system, which is very confusing (this example has been explained in the first section).

Guess you like

Origin blog.csdn.net/weixin_44120025/article/details/124604229