vins-mono algorithm learning

Article directory

Introduction

VINS-Mono: A Robust and Versatile Monocular Visual-Inertial State Estimator
VINS-Mono: A Robust and Versatile Monocular Visual-Inertial State Estimator

1. Main contributions of VINS-mono

  1. Provide an online method for camera IMU external calibration and IMU gyroscope bias correction;
  2. Loop detection based on DBow2 bag of words on the basis of VIO;
  3. Realize relocalization based on tight coupling of IMU and visual features;
  4. Support map saving and loading
  5. Fault detection: Real-time detection of slam tracking status, timely initialization of the system, and pose correction

2. Introduction to the main process of VINS-mono

The algorithm framework is mainly divided into four modules: data preprocessing, system initialization, VIO process based on sliding window, and loopback detection.

  • The data preprocessing module mainly includes image histogram equalization, KLT feature point extraction, de-distortion processing and optical flow tracking. The IMU part mainly performs a pre-integration operation.

  • The system initialization first performs pure visual sfm, and searches for two frames of images with more parallax (20 pixels) and matching points (30) in the current sliding window for initialization, mainly based on epipolar constraints to obtain an R without scale information and t, and then triangulate a set of 3D points. Finally, the poses corresponding to the other frames of images are solved by pnp.

  • After the visual initialization is completed, an online calibration based on loosely coupled information will be performed on the rotation external parameters and gyroscope bias information between the camera and IMU, and then the speed, scale information and gravitational acceleration corresponding to each frame of image will be based on alignment. Finally, the gravitational acceleration is decomposed into two unit orthogonal basis expressions in the tangent plane of the unit space, and brought back into the residual function to optimize the solution.

  • The VIO process based on the sliding window is a normal VIO process. The residual function is composed of three parts: the prior residual obtained by marginalization, the visual reprojection residual, and the IMU residual.

  • The loop detection module is mainly implemented based on the DBOW2 bag of words library. For each key frame, 500 feature points are extracted and the descriptor is calculated and then converted into a bag of words vector. The loop key frame is found through the bag of words vector. After the feature matching relationship is established, the essential matrix and PND are sequentially solved based on RANSAC. The purpose is to screen the outliers. If the number of inliers meets the requirements, a loop is detected, and the residual of the loop is added to the residual function to perform 4 freedom. Degree pose graph optimization.
    insert image description here

3. Visual front-end for sensor information preprocessing

The visual front-end uses KLT optical flow for Shi-Tomasi corner detection and tracking. In order to ensure the stability of feature points during the tracking process, when there are fewer feature points tracked, additional Some feature points are extracted from the current frame to ensure the tracking quality.

Homogenization and de-distortion processing are performed on the detected feature points, and finally the essential matrix is ​​iteratively calculated based on RANSAC through epipolar constraints and outliers are excluded.

The code entry function feature_track.cpp->readImage()mainly includes the following steps:

  • Histogram equalization preprocessing: For the case where the image is too bright or too dark, the contrast of the image is improved to facilitate the extraction of corner points
  • LK optical flow tracking
  • Extract new feature points (if feature point information needs to be sent to the backend)
  • De-distort all feature points

3.1 Principle of KLT Optical Flow Tracking Algorithm

The LK optical flow method aims to explain an optical flow tracking algorithm for image pixels. According to the three assumptions of simultaneous equations, the problem is finally converted into a problem of solving the least squares, that is, the form of AX=b. Since the eigenvalues ​​of the Hessian matrix are relatively large, the assumption of spatial consistency can be satisfied, that is, this situation is met when the corner point is selected as the key point.

In fact, on the one hand, the pixel movement is not that simple, on the other hand, the pixels in the window do not all move in the same way, because such an approximation will inevitably bring errors. The problem now is how to select the appropriate window or feature point to obtain the most accurate tracking. The KLT corner detection method is to select a feature point suitable for tracking. It believes that the definition of a good feature point should be able to be tracked well.

The function integrated in opencv goodFeaturesToTrackis the Shi-Tomasi corner detection.

3.2 Visual front-end KLT optical flow tracking process

The LK pyramid optical flow algorithm is a classic sparse optical flow algorithm, which has three assumptions: constant brightness, small motion, and consistent neighborhood space.

  • Constant brightness means gray consistency: the gray value of the pixel at the same spatial point is fixed in each image
  • Small motion is time continuity: small motions are maintained between adjacent frames, and time changes will not cause drastic changes in pixels
  • Spatial consistency: the pixels of the feature point and its surrounding neighborhood all have the same motion

LK optical flow tracking process:

  1. Image pyramid processing
  2. Based on the gray consistency and spatial consistency of the image, the simultaneous equations can be iteratively solved to minimize the pixel difference between the feature points between the original image pixel block and the matching image pixel block.
  3. After the coordinates of the feature points in the top pyramid image are obtained, the velocity and gradient in the top image are used as the initial value of the optical flow tracking of the next pyramid image until the optical flow of the original image is calculated.

3.3 What is Pyramid Optical Flow

Image pyramid type is a tracking technique used to avoid the optical flow tracking algorithm from falling into local optimal unlocking when the motion between image frames is too large.

When the tracking resolution is large, the difference between the estimated value and the true value may be relatively large, and it is easy to get a local minimum near the true value when iteratively solving. The principle of pyramid optical flow is to resize the image into a smaller resolution image, and track it on this image to get the real global optimal solution.

In the tracking process, the LK optical flow algorithm first tracks based on the topmost pyramid image, and then continues to track the next layer of pyramid images in turn until the pixel coordinates of the feature points on the original image are obtained.

In this way, for an n-layer pyramid, the optical flow tracking of the original image is changed from one optical flow tracking to n optical flow tracking, which increases the computing resources while improving the tracking accuracy. -fusion introduces some predictive mechanisms).

3.4 Screening of outliers during tracking

  • Optical flow tracking will delete feature points that have not been tracked
  • Points outside the image boundary are removed after optical flow tracking
  • After the feature points are tracked in the next frame, an essential matrix is ​​calculated based on RANSAC and the outliers are screened through the principle of epipolar constraints.

3.5 Homogenization of vins-mono features

Sorting the feature points, it is believed that the longer the feature points are tracked, the better the stability and the higher the quality. Then delineate the neighborhood with the feature point as the center, and then ignore other feature points in the neighborhood of the feature point.

3.6 De-distortion method in vins-mono

Reference link: https://blog.csdn.net/qq_33733356/article/details/124356158
The closer the degree of distortion is to the edge of the image, the higher it is, which is the main basis for de-distortion in VINS. VINS uses a gradual approximation de-distortion method.
insert image description here

根据畸变强度越靠近图像边缘越大 AA’ > BB’ > CC’
  • The coordinates containing distortion are known A', and it is necessary to find the real coordinates before the distortion occursA
  • According to the distortion formula, it is relatively easy to calculate the coordinates including the distortion from the real coordinates, and vice versa.
  • Therefore, in VINS, the known A'coordinates are regarded as the real coordinates B, and the distorted coordinates are obtained through the distortion formula B'. Get distortion strength (approximate representation) BB', from A'moving , to BB'get closer points. Then proceed to deduce the closer point . By repeated iterations, a point that is infinitely close within the acceptable range can be obtained .ACCADA

3.7 Keyframe creation method in vins-mono

It is mainly judged based on the disparity with the previous frame and the current tracking quality

  • Parallax: The current frame and the previous key frame must have enough rotation and translation parallax. The rotation parallax is obtained by integrating the angular velocity of the gyroscope, and the translation parallax is obtained by calculating the average displacement of all matching feature points.
  • Tracking Quality: If the number of tracked features is below a certain threshold, set the current frame as the new keyframe.

3.8 Comparison of Optical Flow Tracking and Descriptor Matching

1. Time-consuming comparison
For optical flow tracking, assuming that each frame requires the extraction of 100 feature points, because optical flow can directly track most of the feature points in the previous frame (assuming N), so it only costs for inter-frame tracking Extract (100-N) feature point time and optical flow tracking time.
For descriptor matching, each frame needs to calculate the time it takes to extract 100 features and the descriptor matching time

2. Performance comparison
LK optical flow is more robust than descriptor matching, but the descriptor-based method has better performance in the case of occlusion and faster movement.

Optical flow tracking is more suitable for inter-frame matching, and descriptor matching is more suitable for relocation and loop closure detection.

4 IMU pre-integration for sensor preprocessing

In this section, the pre-integration calculation of PVQ at discrete time is obtained by deriving the integral of IMU at continuous time, and then the analysis of the error Kalman filtering method in vins (that is, the error cost function uses the estimated value of motion at adjacent time and phase The difference between the IMU pre-integration values ​​at adjacent moments), and then from the IMU state error at continuous moments, the error transfer of the IMU state at discrete moments based on the median integral is derived, and the modeling of the IMU zero bias in the pre-integration is carried out Finally, the transfer of the pre-integrated covariance matrix at adjacent moments is deduced.

4.1 Continuous-time PVQ integration

The integral formula for pvq in continuous time is as follows:
insert image description here

For the processing of acceleration integration at time t, firstly, the current accelerometer reading is subtracted from the acceleration zero bias at the current time, and the Gaussian measurement noise is subtracted; then the acceleration at time t is transformed into the world coordinate system (in the VIO system, the world coordinate system must be It is aligned with the direction of gravity, so the influence of gravity can be directly subtracted after the acceleration is transferred to the world system); finally, the final acceleration at time t is obtained by subtracting gravity.

In the IMU pre-integration formula, the quaternion is used to represent the rotation. First, let's understand the multiplication of the quaternion. Multiplying two quaternions p and q can get a new quaternion, the process is as follows:
insert image description here

According to the above formula, the multiplication of two quaternions can be written as a multiplication of a [4x4] matrix and a [4x1] vector.
insert image description here

According to the above formula, the quaternion derivation process can be derived:
insert image description here

The final rotation q_k+1 at time k+1 can be obtained by multiplying the rotation q_k at time K by the integral of the rotation at time Δt, where w is the angular velocity measured by the gyroscope
insert image description here

4.2 Calculation of pre-integration volume at continuous time

Different from filtering, in optimization problems, it is often necessary to optimize the pose at the current moment and the poses at several previous moments. Therefore, when the pose at the previous moment changes, the motion state at the next moment needs to be re-integrated, which will bring In addition, in the VIO system, the motion between two adjacent image frames is optimized, but since the frequency of the IMU is much higher than the frequency of the image frame, it is necessary to set the IMU corresponding to the two adjacent frames in advance. The data is integrated to obtain the corresponding displacement velocity and rotation, so as to participate in the optimization.

For the integration of the above continuous moments, by converting the reference coordinate system from the world coordinate system to the local coordinate system, the pre-integration representation in the local coordinate system can be obtained:
insert image description here

insert image description here

Among them, α, β, γ are the pre-integrated quantities of displacement, velocity and rotation from k to k+1 respectively.

4.3 Pre-integration at discrete time

But in fact, the measurements we obtain are all based on discrete time, so when calculating the pre-integration of adjacent time, it is necessary to convert the pre-integration of continuous time into the pre-integration of discrete time. The median integral is used for discretization in the vins-mono algorithm.
insert image description here

For the subscript problem in the formula, because the displacement, velocity and rotation of two adjacent moments are calculated during pre-integration, all pre-integration is relative to the transformation at time k.

4.4 Error Kalman filter

In the vins series, the definition of the error function during optimization is slightly different from the conventional (ORB-SLAM) definition. For k to k+1 time, based on the estimation, the state at k+1 time (including relative displacement, speed, rotation and bias), the state difference between k+1 time and k time is recorded as the estimated relative motion, and then a measured value of relative motion from k time to k+1 time can be obtained according to IMU pre-integration, vins optimized The error function is to solve the state corresponding to the k+1 time instant when the estimated relative motion and the measured relative motion (pre-integration) are minimized.

Why use error Kalman filtering? ?
(1) Using a 4-degree-of-freedom quaternion to describe a 3-degree-of-freedom rotation will cause overparameterization, so the rotation vector is used to directly describe the rotation. Generally speaking, the rotation vector is periodic (that is, the rotation angle is between -pi and pi Change), introduce error Kalman, the rotation error at this time will be around 0, avoiding this periodicity
(2) For rotation, the error can be guaranteed to change around 0, and problems such as gimbal deadlock can be avoided
(3) The amount of error is small, its second-order derivative can be ignored, and only the first-order Jacobian is calculated

4.5 Continuous time IMU state error state derivation

The derivative of the IMU error with respect to each state:
insert image description here

The following derivation reference link: https://blog.csdn.net/weixin_45626706/article/details/118074274

Derivative of the error function with respect to the displacement component:
insert image description here

Derivative of the error function with respect to velocity:
insert image description here

Derivative of the error function with respect to rotation:
insert image description here

insert image description here

Derivative of the error function with respect to the bias:
insert image description here

4.6 Error transfer of discrete-time pre-integration based on median integration

insert image description here
insert image description here

Specifically:
insert image description here

Derivation of Discrete Time Angle Error
insert image description here

Derivation of Discrete Time Velocity Error
insert image description here

insert image description here

Derivation of Discrete Time Displacement Error
insert image description here

Derivation of Discrete Time Zero Bias Error
insert image description here

4.7 Modeling of IMU bias in pre-integration

The calculation in the IMU integration process will involve the zero bias, because the VINS-MONO tightly coupled VIO, its zero bias is also an estimated quantity, which means that in the process of solving the optimization problem, each optimization may change the zero bias Make the condition that satisfies the minimum of the overall optimization. This poses a problem: every optimization requires recomputing the integral.

In vins, the method of first-order approximation to zero bias is used to update the pre-integration amount
f(x+Δx) = f(x) + J(x)Δx = 原先预积分的结果 + 相对于零偏的一阶雅克比 * 补偿量
insert image description here

Solve the Jacobian of the pre-integrated quantity with respect to the zero bias
insert image description here

4.8 Transfer of the preintegrated covariance matrix

假设a符合高斯分布,其方差为A。对a乘以系数k, 则ka的方差为k^2A
若a为向量,则其协方差为A的矩阵
若F矩阵乘以a向量,其协方差为FAF^T
若向量a的协方差矩阵为A,向量b的协方差矩阵为B,则向量a+向量b的协方差矩阵为A+B矩阵

insert image description here

5. VIO initialization process

5.1 State quantities that need to be initialized

  • where x_k is the IMU state when the kth image was captured. It contains the position, velocity and orientation PVQ of the IMU in the world coordinate system, as well as the accelerometer bias and gyroscope bias in the IMU body coordinate system.
  • x_bc is the external parameter between the IMU and the camera coordinate system. The external parameter of the rotation is initialized by loose coupling, and the external parameter of the translation can be obtained by measurement. It does not need a very accurate external parameter, because it will continue to be changed in the nonlinear optimization. optimization
  • λl is the inverse depth at which the l-th feature is first observed.
  • n is the number of IMU states between two adjacent image frames, m is the number of feature points
    insert image description here

5.2 State quantities without initialization

  1. Translational extrinsic parameters
    For translational extrinsic parameters, acceleration double integration is required and it is noisy. The extrinsic parameters obtained based on this noise-based translation estimation are not as accurate as direct measurement. By measuring the translation between the IMU and the camera The error of the external parameters can be controlled within a few millimeters. In addition, the external parameters will be continuously optimized in the subsequent nonlinear optimization.
  2. Accelerometer bias
    Accelerometer bias is difficult to separate from gravity estimation, and ignoring the accelerometer bias does not significantly affect the results. In the algorithm, the initial bias is set to 0, and then this value is continuously updated during the nonlinear optimization process.

5.3 Calibration of rotation extrinsic parameters between IMU and camera

The rotation external parameter between the IMU and the camera uses a loosely coupled calibration algorithm. For the time k to k+1, the rotation motion R_c between two frames can be recovered by pure visual SFM, and based on the IMU measurement The pre-integration R_b from k time to k+1 time can be obtained, and the following formula is satisfied for any time:
insert image description here

After several frames, the visual rotation between a group of adjacent frames and the corresponding IMU pre-integration rotation can be obtained, and the least squares problem can be constructed through this constraint relationship to solve the rotation external parameters between the camera and the IMU.

5.4 Pure monocular vision initialization

Perform pure monocular SLAM on the image frame in the current sliding window to recover the camera pose and 3D points without scale information

1. Find two frames of images with enough feature points (30) and large enough parallax (20x pixels) in the image frame of the sliding window, assuming it is the 4th and 10th frame
insert image description here

2. According to the principle of polar constraints, based on RANSAC iterative solution, the motion of the 4th frame and the 10th frame can be calculated (R,t), and the 3D space point without scale information can be recovered
insert image description here

3. According to the obtained 3D points, solve the pose (without scale) for other frames of the sliding window through pnp, and restore more 3D points (without scale) at the same time
insert image description here

4. Minimize the reprojection error based on ceres automatic derivation, and obtain the optimized pose without scale

5.5 Gyroscope bias initialization

First, the rotation and translation of the IMU to the c0 reference frame at time k can be defined:
insert image description here

The specific derivation is as follows:
insert image description here

For two consecutive frames of k frame and k+1 frame, according to the previous pure vision initialization, the estimated value of the pose of the IMU relative to the c0 reference frame q_bk_c0andq_bk+1_c0

Then the IMU itself can get a measurement value of the rotation between two frames γ_bk+1_bk. Because of the relationship between the gyroscope bias, the relative rotation between the estimated pose and the relative rotation obtained by pre-integration will not be completely equal, so you can define Error model for one row.
insert image description here

In vins, the first-order linear approximation is made to the zero bias error of the accelerometer and gyroscope, because the calculation of the bias is involved in the pre-integration process of PVQ, and the corresponding pre-integration needs to be re-integrated when the bias changes. In order to reduce the computational complexity, the bias of the accelerometer and gyroscope is first-order linearized to the IMU error, so that when the bias is updated, only the paranoid first-order approximation compensation needs to be added to the original pre-integration results behind.
insert image description here

The solution process is as follows:
insert image description here

5.6 Visual-inertial alignment

Construct the corresponding IMU coordinate system velocity, scale and gravity acceleration
visual-inertial alignment for each frame. scale, the pose is not aligned with the gravity direction), and the pre-integration of the adjacent frame IMU is obtained. The next step is to combine the visual estimation results and the IMU pre-integration measurement to restore the pose and the scale of the 3D point, and align the pose to the gravity. direction, while estimating the velocity corresponding to each image frame.

First define the IMU coordinate system velocity, scale and gravitational acceleration corresponding to each frame:
insert image description here

Then the following formula can be obtained according to the pre-integration of two consecutive frames k and k+1 frames with respect to displacement and velocity:
insert image description here

insert image description here

According to the above two formulas, the following linear constraint equation can be obtained
insert image description here

Then construct the least squares problem to solve
insert image description here

Finally, the corresponding IMU coordinate system speed, scale and gravity acceleration and initial value of each frame are obtained.

Gravity adjustment or gravity refinement
In most cases, the gravity is already known, and we can know the local acceleration of gravity by looking up tables and other methods. If the actual acceleration due to gravity we're solving for is different than the force of gravity, then there must be something wrong. Therefore, in the initialization of VINS, it also needs to be adjusted. We adjust the estimated gravitational acceleration with the known magnitude of the gravitational acceleration 9.81. We take the direction of the calculated gravity and adjust the size.

Because the magnitude of gravity is known, the acceleration of gravity changes from the original 3 degrees of freedom to 2 degrees of freedom (the original gravity is composed of components in the three directions of xyz, and now the modulus of gravity is known. If you know the components in the xy direction Then the component in the z direction can be solved). At this time, the gravity can be decomposed into two orthogonal unit vectors b1 and b2 multiplied by their respective weights (components in the x and y directions) on the tangent plane.
insert image description here

The acceleration of gravity can be expressed by the following formula:
insert image description here

Then use this representation to replace the corresponding g_c0 when solving the IMU coordinate system velocity, scale and gravity acceleration corresponding to each frame before, and make the unit vector in the direction of gravity converge through iterative optimization. Among them, b=[b1,b2], W=[w1,w2].
insert image description here

At this time, the rotation q_c0_w between the world coordinate system and the camera coordinate system c0 can be obtained. Then we rotate all variables from the reference frame (·)^c0 to the world frame (·)^w. The velocity of the body frame will also be rotated to the world frame. The transformation matrix for visual SfM will be scaled to the unit of measure. At this point, the initialization process is complete and all these metrics will be fed into a tightly coupled monocular VIO.

6. Tightly coupled monocular VIO

6.1 Optimizing the Composition of State Variables

  • where x_k is the IMU state when the kth image was captured. It contains the position, velocity and orientation PVQ of the IMU in the world coordinate system, as well as the accelerometer bias and gyroscope bias in the IMU body coordinate system.
  • x_bc is the external parameter between the IMU and the camera coordinate system. The external parameter of the rotation is initialized by loose coupling, and the external parameter of the translation can be obtained by measurement. It does not need a very accurate external parameter, because it will continue to be changed in the nonlinear optimization. optimization
  • λl is the inverse depth at which the l-th feature is first observed.
  • n is the number of IMU states between two adjacent image frames, m is the number of feature points
    insert image description here

6.2 Composition of the residual function

Based on the prior information obtained by marginalization, IMU measurement residuals, visual measurement residuals, and the corresponding state of each frame are obtained by minimizing all measurement residuals during the sliding window process.
insert image description here

6.3 Meaning of IMU residuals

It is known that the imu pose, velocity, and rotation from the imu to the world coordinate system obtained through optimization estimation at K time, and the pre-integration from k time to k+1 time, solve the pose, speed and rotation at k+1 time

The state pvq and bias at time k in the residual function are known a priori, and the state at time k+1 is the quantity to be estimated. The physical meaning of the residual is to make the difference between the estimated values ​​of the two frames and the pre-integration of the imu The difference between the measured values ​​is minimized
insert image description here

The residual mainly consists of 5 parts:

  • Residual of displacement: Subtract the estimated value of pose at time k+1 from the estimated value of pose at time k to obtain a relative displacement, and pre-integrate the measured value of Imu to obtain a relative displacement, and the residual is the difference between two relative quantities
  • speed and rotation
  • The bias residual of the IMU is to directly use the residual at k+1 time minus the residual at k time (with the integral, the accelerometer and gyroscope residuals of the IMU data between two frames are regarded as transiently unchanged)

6.4 Definition of visual residual

Vins adopts a new description of projection error, defining the measurement residual of the camera on the unit sphere.
insert image description here

The tangent plane of the unit sphere here should be the normalized plane. There is a vector from the optical center of the camera to the feature points on the normalized plane. Through motion estimation, the feature points projected from frame i to frame j and obtained by measurement The feature point on the jth frame of the j will be connected with the optical center to get two vectors, which are normalized to a unit vector and mapped to the tangent plane coordinate system (normalized plane), and then calculate the mapped vector difference as visual residual

In general, it is one step more than the polar constraint (not ORB-SLAM) visual residual calculation. After projecting the feature points to the normalized plane, the coordinates are normalized into unit vectors. The purpose is to reduce noise and mismatching, etc. The impact on pose estimation.

6.5 The difference between vins-mono visual residual and ORB-SLAM

Similar operations are used when ORB-SLAM3 is based on the pole-constrained pose solution. Due to the existence of noise, rounding of values, mismatching, etc., the basic matrix F calculated directly based on the pixel points will have a large deviation. Therefore, it is necessary to normalize the pixel coordinates of the feature points before solving the basic matrix. Finally, Get a set of normalized coordinates with de-mean and decentralized, and a transformation matrix T from pixel coordinates of feature points to normalized coordinates.

6.6 Marginalized Residuals

  • What is marginalized is the state of the imu (ie pvq and two offsets) and the inverse depth information of the feature points
  • Converting the measured value of the corresponding state to a priori means that in the subsequent state estimation, these values ​​​​are used as the correct prior true value to participate in the pose estimation of the new state
  • Compute and update using Shure fill when marginalized

Marginalization rules:

(1) If the penultimate frame is a key frame, keep that frame in the sliding window, the oldest frame and the measurement corresponding to this frame are marginalized (2)
If the penultimate frame is a non-key frame, we lose the visual Measurement value (map point information), retain the IMU measurement value connected to this non-key frame, the purpose is to make the hassian matrix less dense

Early fixed issue with linearization points when marginalizing

Marginalization leads to early fixation of the linearization points, which can lead to suboptimal estimation results. However, the author did not use the FEJ strategy in vins. Since small drift is acceptable for VIO, the negative effects caused by marginalization are not considered important.
insert image description here

7 Lightweight processing of algorithms for mobile phones and other devices

The core idea is to simplify the number of optimization variables and optimization objects. Only the attitude and velocity of a fixed number of latest IMU states are optimized, and the states of feature depth, extrinsic parameters, biases and old IMU states that do not want to be optimized are treated as constants. All visual and inertial measurements are then used for pure motion BA.

8 Fault detection

1. The number of features tracked in the latest frame is less than a certain threshold;
2. There is a large discontinuity in position or rotation between the two most recent estimator outputs;
3. There is a large change in bias or extrinsic parameter estimates ;
Once a failure is detected, the system switches back to the initialization phase. Once the monocular VIO is successfully initialized, a separate pose graph will be created.

8 relocation module

The relocation concept here is slightly different from the relocation in ORB-SLAM, corresponding to the loop correction in the ORB-SLAM process.

8.1 Loop detection

Use the DBOW library to detect candidate loop keyframes based on bag-of-words vectors. In order to perform loop-closing detection, vins extracts 500 additional features based on the corner points of optical flow tracking and calculates the BRIEF descriptor, and then abstracts the image into a bag-of-words vector. to save. For the new key frame, calculate the similarity of the word bag vector to judge the loopback candidate frame.

In addition, the algorithm provides map saving and loading functions, and the loaded map will also exist as a candidate matching library for loop closure detection.

8.2 Feature Recovery

Descriptor-based feature matching is first performed between key frames with a loop closure relationship. Based on RANSAC, 2D-2D essential matrix and 3D-2D PNP are used to filter outliers and calculate relative positional relationships. When the number of inliers meets the requirements, it is identified as Candidate loopback frame.

8.3 Tightly coupled relocation

It is to establish a constraint relationship based on the pose of the current sliding window and the common-view relationship with the loopback frame. The optimization variables are still the imu state in the current sliding window, the external parameters of the camera and IMU, and the inverse depth of feature points.

The difference is that the pose information of the loop frame is added as a priori in the visual residual part, and the characteristics of the loop frame and the matching and constraints of the key frame in the current sliding window
insert image description here

9. Global Pose Graph Optimization

  • The optimization variables are camera position xyz and yaw angle
  • Conditions in adding keyframes in the pose graph: When a keyframe is marginalized from the sliding window, it is added to the pose graph.
  • Vertices and edges in the pose graph: the vertices are the poses of the keyframes, and the edges are divided into two types: sequential edges and loop edges
  • The residual of the sequential edge part is calculated in the same way as the residual of the loop edge part. The residual of the loop edge requires a robust kernel function. The sequential edge is extracted from the VIO sliding window and already contains enough outlier exclusion mechanisms. The robust kernel is no longer used. function.
  • Keyframe retention and removal mechanism: All keyframes with loopback constraints are retained, and other frames that are too close to adjacent frames or have very similar directions will be deleted.

10. Delay Estimation of Sensors

The core idea is to interpolate between two adjacent frame times based on the delay time.

The author adopts the method of moving feature points to solve these problems. The author made an assumption that in a short period of time, the movement of the camera is a uniform movement. Based on assumptions, the motion of feature points on the camera plane is calculated. Using the projection on the camera at different times, you can easily calculate the coordinates of the feature points at a certain moment in the middle (assuming that the speed of the feature points is also moving at a uniform speed).

If our feature points are represented by 3D coordinates, we can directly use the projected coordinates of the feature points, the velocity of the feature points, and the compensation time to calculate the coordinates of the feature points on the image at the corresponding time.

In vins, the pose estimation of the robot is carried out around the IMU, so the processing of the delay occurs. As described in ORB-SLAM, there is no guarantee that the time stamp of each frame of the image can be strict and corresponding. The IMU data remains consistent. The processing of this phenomenon in vins is to construct a virtual constraint relationship according to the moving speed and time difference of the feature points.

Guess you like

Origin blog.csdn.net/guanjing_dream/article/details/129270442