Fourteen visual SLAM lecture notes (super easy to understand~)

Insert picture description here

Preface

Finally came to the SLAM learning part of my heart. I believe that you who read this article can't bear the tension and excitement in your heart like me, so let's start scrolling together!
In this article, I will briefly summarize the most basic knowledge of each link of SLAM14, mainly to help you see what the 14th chapter talks about.
If you want to learn more, read a book!
Let's go for it!

Preliminary knowledge

SLAM: Simultaneous Localization and Mapping
Chinese name: Simultaneous localization and map construction
Basic definition: The subject equipped with specific sensors, without prior environmental information, builds a model of the environment during the movement and estimates its own movement.
The sensor here is mainly a camera, it is "visual SLAM"

Chapter division

The whole book is divided into two parts:

  • Visual Basics Chapter 1-6
  • Practical Application Chapter 7-14

Lecture 1: Preliminary knowledge
Lecture 2: Overview of SLAM system, introducing the components of SLAM and the specific work of each module. Programming environment construction and IDE use
Lecture 3: Three-dimensional rigid body motion, mainly understand rotation matrix, Euler angle, quaternion, practice using Eigon Lecture
4: Learn Lie group and Lie algebra, definition and usage; practice using Sophus operation
Lecture 5: Pinhole camera model, the expression of the image in the computer; Use OpenCV to call the camera internal and external parameters
Lecture 6: Nonlinear optimization, including the theoretical basis of state estimation, least squares problem, gradient descent method; use Ceres and g2o for Curve fitting experiment
Lecture 7: Visual odometry based on the feature point method, feature extraction and matching, calculation of polar geometric constraints, PnP and ICP, etc. Use the above method to estimate the motion between two images.
Lecture 8: Direct visual odometry, learn the principles of optical flow and direct methods, and use the above methods to realize simple direct motion estimation.
Lecture 9: Back-end optimization, mainly discusses Bundle Adjustment (BA) in depth, uses sparsity to speed up the solution process, and uses Ceres and g2o to write BA programs respectively.
Lecture 10: Pose map in back-end optimization, introduce SE(3), Sim(3) pose map, use g2o to optimize a pose ball
Lecture 11: Loop detection, introduce the bag of words model Loop detection using DBoW3 to write dictionary training program and loop detection program
. Lecture 12: Map construction, use monocular to estimate dense depth map, discuss the process of RGB-D dense map construction
Lecture 13: Engineering practice, build binoculars Visual odometry framework, comprehensive use of previous knowledge, use Kitti data set to test performance
Lecture 14: Introduce the current open source SLAM solution and future development direction

Basic knowledge required

  • Advanced mathematics, linear algebra, probability theory
  • C++ language foundation
  • Linux system basics

Lecture 2 First understanding of SLAM

SLAM answers two key questions:

  1. Where am i? -Positioning
  2. How is the surrounding environment? ——Mapping

Sensors

Two types of sensors:
Installed in the environment: QR code, GPS, and
carried on the robot body:
Compared with the two types of sensors, laser and camera, laser mapping has basically been studied clearly, and the visual SLAM has not been stable and reliable.
Camera: lightweight , Cheap, rich in information
Disadvantages: occlusion, affected by light, large calculation amount,
monocular, binocular, depth camera (TOF/structured light)

Classic visual SLAM framework

Insert picture description here

  • Sensor information reading: In visual SLAM, it is mainly the reading and preprocessing of camera image information. If it is a robot, it may also be the reading and synchronization of information such as code discs and inertial sensors
  • Front-end visual odometry (Visual Odometry, VO): The task of visual odometry is to estimate the movement of the camera between adjacent images and the appearance of the local map, also known as the front end (Front End)
  • Back-end nonlinear optimization (Optimization): The back-end accepts the camera pose measured by the visual odometer at different times and the information from the loop detection, and optimizes them to obtain a globally consistent map. Because after VO, it is also called Back End
  • Loop Closure Detection: Loop detection detects whether the robot has reached the previous position. If a loop is detected, it will provide the information to the backend for processing.
  • Mapping: It builds a map corresponding to the task requirements based on the estimated trajectory

The mathematical expression of SLAM problem

Regarding the mathematical description of SLAM, there are mainly two equations, namely the position equation and the observation equation:
Insert picture description here
xk represents the current position of the robot, uk is the reading or input of the motion sensor, and zkj is the observation data established by the robot and the environment.
In addition, Both the position equation and the observation equation need to introduce noise, that is, the w and v in the formula, and therefore need to be optimized at the back end, or state estimation, to find the most likely pose map.
You can refer to the figure below to have a clearer understanding
Insert picture description here

Programming Basic Linux

You can refer to another blog post of mine, which records the most basic operating instructions of Linux. For beginners, it can reduce the burden of learning
Linux basic knowledge study notes (common instructions)

Use Cmake

This part is not well understood. It can be simply understood that there will be a CmakeList to help the project judge how to compile
in Linux. When we finish writing the source code, we need to compile, link the multi-file project and generate the executable file. In order to achieve debugging and running, Cmake can generate a makefile for the project, and then use the make command (C++ library is compiled with g++) to compile the entire project according to the contents of the makefile. The
book briefly introduces how to use the command line and cmake The example of building and running the project is more interesting

Lecture 3 Rigid body motion in three-dimensional space

  • The goal of this lecture: understand the description of rigid body motion, rotation matrix, transformation matrix, quaternion and Euler angle
  • Master the matrix of Eigen library and the usage of geometry module

3.1 Point, vector and coordinate system, rotation matrix

  • Point exists in three-dimensional space
  • Points and electricity can form vectors
  • The point itself is described by the vector pointing to it from the origin

Vector: With a pointing arrow, operations such as addition and subtraction can be performed.
Vector coordinates: can be expressed by R3 coordinates.
Coordinate system: consists of three orthogonal axes.
A set of bases forming a linear space is
divided into left-handed and right-handed
vector operations: Addition and subtraction, inner product, outer product
Regarding the outer product, a small tip is introduced, which means that a vector becomes a matrix
in SLAM:

  • There is a fixed world coordinate system and a mobile robot coordinate system
  • The robot coordinate system changes with the movement of the robot, and there is a new coordinate system at every moment

How to transform the two coordinate systems? The translation between the origin and the
rotation matrix of the three axes :
sufficient and necessary conditions:

  • R is an orthogonal matrix, the inverse of the matrix is ​​equal to transpose
  • The determinant of R is 1

Three-dimensional space rotation: SO(3) has also become a special orthogonal group. The concept of group is discussed in Lie algebra.
Four numbers are used to describe three-dimensional coordinates, which can realize translation and rotation at the same time.
This approach is called homogeneous coordinates .
At this time, rotation and translation can be put into a matrix, called a transformation matrix .

The rotation matrix and transformation matrix are as follows (non-homogeneous form)
Insert picture description here

3.3 Euler angle

Euler angle describes the angle at which a rigid body rotates around three axes.
Different rotation sequences correspond to different Euler angles. The ZYX sequence is commonly used, that is, the "yaw-pitch-roll"
Euler angle has a universal lock problem. In some cases, one degree of freedom will be lost.
For the universal lock problem, you can use a mobile phone to test (or any object). Once it is rotated around one axis ±90 degrees, the other two axes will have the same effect and lose one dimension. Information, very interesting.
Insert picture description here

In fact, we can’t find a three-dimensional vector description without singularity,
so the four-dimensional description of quaternion is used to represent rotation.

3.4 Quaternion

An extended complex number.
Quaternion has three imaginary parts, which can express rotation in three-dimensional space
. There are specific arithmetic relations between the imaginary parts. I think that cross product
quaternions in three-dimensional space have many specific formulas, including quaternions. The inverse of (important for expressing rotation)

note: The quaternion multiplication is overloaded in the Eigon library. It is not necessary to write the inverse of the quaternion. The operator overloading automatically completes this step.

Lecture 4 Lie Group and Lie Algebra

The fundamental purpose of studying Lie groups and Lie algebras:

  1. Turn the rotation matrix into a form that can be added and subtracted
  2. Eliminate constraint problems in back-end optimization

Through the conversion relationship between Lie groups and Lie algebras, it is hoped to turn the pose estimation into an unconstrained optimization problem.

Lie groups and Lie algebra foundation

We have contacted two groups before:

  • Special orthogonal group: the rotation matrix of SO(3) three-dimensional space is an orthogonal matrix, and the determinant is 1, then it is the rotation matrix
  • Special Euclidean group: SE(3) contains a three-dimensional space rotation matrix R, a translation vector t, a four-dimensional space transformation matrix of 0^T and 1, which is a three-dimensional Euclidean transformation group

Group : It is an algebraic structure of a set plus an operation. This operation must satisfy the four conditions of "closing unitary inverse"

  • Closure
  • Associative law
  • Unitary
  • Reverse

Lie Group : refers to the group
antisymmetric matrix with continuous (smooth) properties : for a vector, it can be turned into an antisymmetric matrix, and for any antisymmetric matrix, a corresponding vector can also be found , The specific symbol and meaning are shown in the figure below.
Insert picture description here
Lie algebra Φ : It is the tangent space of SO(3) near the origin. There are the following approximate calculation formulas. Derivation of the rotation matrix is ​​equivalent to multiplying a Lie algebra antisymmetric matrix, rotation matrix The Lie algebra equal to e to the power of the antisymmetric matrix
Insert picture description hereInsert picture description here
Lie algebra consists of a set, a number field, and a binary operation. If they satisfy the following characteristics, ( V,F,[,] ) is called a Lie algebra, It
Insert picture description here
needs to be clearly marked as g that Lie algebra indiscriminately represents a three-dimensional vector and its antisymmetric matrix. This can be seen in the following formula
Insert picture description here
Insert picture description here

Exponential mapping : In Lie groups and Lie algebras, calculating exp(Φ^) is called Exponential Map (Exponential Map)
for a certain moment (this is very important, regardless of the influence of time t), take SO(3) as an example, The following equation
Insert picture description here
holds. The left side of the equation is the rotation matrix at a certain moment (there is a t in the formula derivation). At the same time, we equivalent Lie algebra Φ to θa, θ is the modulus length, and a is the length of 1 The direction vector, in
particular, the above formula andRodriguez formulaIn the same way, this shows that the Lie algebra so(3) is actually a space composed of so-called rotation vectors , and the exponential mapping is the Rodriguez formula.
Through them, we correspond any vector in Lie algebra so(3) to a rotation matrix in Lie group SO(3) . Conversely define logarithmic mapping

Logarithmic mapping : the inverse process of exponential mapping, the formula is as follows:
Insert picture description here
Lie group and Lie algebra mutual transformation relationship is represented by the following relationship diagram
Insert picture description here
** Lie algebra derivation problem: ** When solving the optimal estimation, we often construct and pose Related functions, and then discuss the derivative of the function with respect to pose to adjust the current estimate.
Ideas for Derivation of Lie Algebra:

  • Use Lie algebra to express posture, and then derivate Lie algebra according to Lie algebra addition
  • Multiply the Lie group to the left or right by the small disturbance, and then obtain the derivative of the disturbance, which is called the left and right disturbance models.

Derivation of Lie algebra directly, the result is as follows, because it contains a more complicated form of Jl, we don't like this representation.
Insert picture description here
The derivation of the disturbance model is as follows. It is important to understand the following formula
Insert picture description here

Practice Sophus

Needs to be added

Lecture 5 Camera and Image

Camera model

The camera maps the coordinate points in the three-dimensional world (in meters) to the two-dimensional image plane (in pixels), and can be described by a geometric model.
The most commonly used is the pinhole model .
In addition, the conversion relationship of several coordinate systems is involved here.

Four coordinate systems:

World coordinate system-camera coordinate system-image coordinate system-pixel coordinate system
World coordinate system : the most primitive basic coordinate system defined by man, used to describe the position of each object in the world
Camera coordinate system : the camera optical center Is the origin, the camera's own coordinate system is a three-dimensional coordinate system.
Image coordinate system : on the imaging plane of the camera, in order to describe the imaging situation of the camera coordinate system, the established two-dimensional coordinate system
pixel coordinate system : take the upper left corner of the imaging plane Is the origin, and the pixel size is the coordinate division value. The image on the image coordinate system is converted into pixel coordinate information, which is the two-dimensional coordinate system of the final image

Pinhole camera model

The schematic diagram of the pinhole camera model is shown in the figure.
Insert picture description here
World coordinate system to camera coordinate system:
complete a translation and rotation from the custom world
coordinate system to the image coordinate system:
use the projection relationship and find a projection ratio based on the similarity theorem Relational
image coordinate system to pixel coordinate system:
due to the different coordinate scale and origin, it needs to be stretched and translated

what is interesting: When we shoot with a monocular camera, we will lose depth information. This is reflected in the normalization process in the calculation process. From a physical perspective, as long as the projection point of an object remains unchanged, it can be in space Move at any distant distance, or small objects nearby and large objects in the distance can look exactly the same

Distortion model

Distortion classification:

  • Radial distortion: distortion caused by lens shape: barrel distortion & pincushion distortion
  • Tangential distortion: distortion introduced by the lens and the imaging surface not being strictly parallel

Insert picture description here

Practice part

Needs to be added

Guess you like

Origin blog.csdn.net/qq_41883714/article/details/110193979