[Learning Summary] VIO Initialization Learning 1: Monocular Visual–Inertial State Estimation With Online Initialization and Camera–IMU

I recently read a paper, which was very confusing, and I probably understood the reason. Record it.
Paper: Monocular Visual–Inertial State Estimation With Online Initialization and Camera–IMU Extrinsic Calibration
Authors: Zhenfei Yang, Shaojie Shen.

background knowledge

VIO initialization: https://blog.csdn.net/shyjhyp11/article/details/115403769

Insert image description here

A brief summary of the paper:

  1. First, use the idea of ​​hand-eye calibration to estimate the rotation of the external parameters;
  2. The rotation is known, ignore the change in bias, and calibrate at the same time: speed, gravity direction, feature point depth, and translation;
  3. Further refined optimization;
  4. A marginalization strategy is adopted to ensure sliding window size and information retention.

Detailed explanation of the paper

IMU points, pre-points and other related content

Formula (1): Direct integration of IMU
Insert image description here
Formula (2): Position change α \alpha caused by acceleration termα , velocity changeβ \betaβ , the angle change caused by the angular velocityR bk + 1 bk R_{b_{k+1}}^{b_k}Rbk+1bk
Insert image description here
Formula (3): Rewriting of (1)
Insert image description here

Target

The state quantities required for estimation: translation, velocity, gravity direction xk x_k at all IMU momentsxk, and camera and IMU external parameter pcb p_c^bpcb, and the depth information λ \lambda of the feature pointsl
Insert image description here

Rotation estimation

Before estimation, a preliminary map is first established visually, so the rotation amount of the Camera between two frames can be obtained through the feature matching results; then the
rotation amount of the IMU is obtained through the integration of the IMU (due to the short time, transformations such as bias are ignored)

This is the time to solve formula (4): This is a basic formula for hand-eye calibration, and the solution is relatively simple. Just read the paper. In addition, a weight based on the rotation estimation error, Huber norm, is added.
Insert image description here

Speed, gravity direction, feature point depth, camera IMU translation estimation

Core formula (11): The first term is the constraint of the a priori term, the second term is from the IMU, and the third term is from the camera. The following is an introduction.

First of all, before, the author assumes that all the following rotation amounts are known, including the rotation of the camera and imu (calculated above), and the rotation of imu at different times (provided by formulas 2 and 3).
Insert image description here

This part of the content is actually a bit messy, and it took me a long time to understand what I was doing. First look at the error term of the camera: (17) (18)
Insert image description here

The 0 in (17) expresses something strange. But if you look at the back, you can understand what the author wants to express.
First for ci c_iciFeature point ll first observed in the framel , which iscj c_jcjThis frame is observed, and a constraint is established at this time. By assigning depth λ l \lambda_lll, from camera to base (IMU), the rotation increment obtained through a series of IMU integrals is used to obtain the IMU posture at time j, and then converted back to the camera system. At this point, it should be the same as cj c_jcjThe observations below are consistent, then the MM hereM is used to construct visual errors.
Theoretically, if the external parameters (the translation amount) and the depth (λ \lambdaλ ) is calculated correctly, and the formula (17) should be 0. This is what (17) wants to express. Then we make this formula equal to 0, and we can find the accurate translation amount and depth!

Based on this idea, let’s look at the IMU term: Equation (12)
Insert image description here
Equation (12) constrains: the velocity term and the direction of gravity.
In the third line of (12), after the old gravity direction is rotated, it should be consistent with the new one, and the error is 0, thus constraining the gravity attitude.
The second line and the first line both constrain the velocity term.
Therefore, minimizing (12) provides the velocity and gravity direction.

But (11) there are two things not mentioned , one is the state quantity XXHHin front of XH , andPPP matrix. HH
is given if it is not shown in the paperH form, but you can write it yourself according to equation (10). PP
in the lower right cornerP , is the covariance matrix, obtained based on the noise of the IMU and the noise of the camera imaging. See the paper for details.

Solution to equation (12)

The author points out that Equation (12) can be written in the form of Equation (21) and solved analytically directly without the need for iteration. Here we need to involve the HH mentioned aboveH. _
Insert image description here

further refinement

The above solutions are just some rough calculations, for example, error transmission is ignored. What needs to be optimized at this time is equation (28),
Insert image description here
just ignore it, otherwise you won’t understand it...

postscript

Hey, let’s bite the bullet on this part of the IMU. I didn’t want to delve into it for many years, but I still can’t get around it.

Guess you like

Origin blog.csdn.net/tfb760/article/details/129818506