Excerpts from the notes of "Fourteen Lectures on Visual SLAM"

Excerpts from the notes of "Fourteen Lectures on Visual SLAM"

ch02 First introduction to SLAM

Classic visual SLAM framework

Insert image description here

The visual SLAM process includes the following steps:

  1. Sensor information reading: In visual SLAM, it is mainly the reading and preprocessing of camera image information. If it is in a robot, there may also be the reading and synchronization of information such as code disks and inertial sensors.

  2. Visual Odometry (VO): The task of visual odometry is to estimate the movement of the camera between adjacent images and the appearance of the local map. VO is also called the Front End.

    Visual odometry inevitably suffers from the problem of Accumulating Drift.

  3. Backend optimization (Optimization): The backend accepts camera poses measured by visual odometry at different times, as well as loop detection information, and optimizes them to obtain a globally consistent trajectory and map. Because it is connected after VO, it is also called the backend. Back End.

    In visual SLAM, the front-end is more related to computer vision research fields, such as image feature extraction and matching, while the back-end is mainly filtering and nonlinear optimization algorithms.

  4. Loop Closing: Loop Closing detects whether the robot has reached the previous position. If a loop is detected, it will provide the information to the backend for processing.

  5. Mapping: It builds a map corresponding to the task requirements based on the estimated trajectory.

    There are two types of maps: metric maps (which accurately represent the positional relationships of map objects) and topological maps (which place more emphasis on the relationships between map elements
    ).

Mathematical formulation of SLAM problem

"The radish carries sensors and moves in the environment" is described by the following two things:

  1. What is motion? We have to consider how the position xxx of the little radish changes from time k − 1 k-1 k−1 to time kkk.

    Equations of motion:

    x k = f ( x k − 1 , u k , w k ) x_k = f(x_{k-1}, u_k, w_k) xk=f(xk−1,uk,wk)

    • xk , xk − 1 x_k, x_{k-1} xk,xk−1 represent the position of the radish at kkk and k − 1 k-1 k−1
    • uk u_k uk represents the reading of the motion sensor (sometimes also called input)
    • wk w_k wk represents noise
  2. What is an observation? Suppose that Little Carrot detects a certain road sign yj y_j yj at xk x_k xk at time kkk. We need to consider how this matter is described in mathematical language.

    Observation equation:

    z k , j = h ( y j , x k , v k , j ) z_{k,j} = h(y_j, x_k, v_{k,j}) zk,j=h(yj,xk,vk,j)

    • zk, j z_{k,j} zk,j represents the observation data generated by the little radish seeing the landmark point yj y_j yj at the position xk x_k xk
    • yj y_j yj represents the jjjth landmark point
    • vk, j v_{k,j} vk,j represents noise

These two equations describe the most basic SLAM problem: when we know the motion measurement reading uuu and the sensor reading zzz, how to solve the positioning problem (estimate xxx) and mapping problem (estimate yyy)? At this time, we will The SLAM problem is modeled as a state estimation problem: how to estimate internal and hidden state variables through measurement data with noise?

ch03 Rigid body motion in three-dimensional space

rotation matrix

Points and vectors, coordinate systems

  1. The coordinates of vector aaa under the basis of linear space [e 1, e 2, e 3] [e_1, e_2, e_3] [e1, e2, e3] are [ a 1 , a 2 , a 3 ] T [a_1, a_2 , a_3]^T [a1,a2,a3]T.

    a = [ e 1 , e 2 , e 3 ] [ a 1 a 2 a 3 ] = a 1 e 1 + a 2 e 2 + a 3 e 3 a = [e_1, e_2, e_3] \left[\begin{array}{c} a_1 \\ a_2 \\ a_3 \end{array}\right] = a_1e_1 + a_2e_2 + a_3e_3 a=[e1,e2,e3]⎣⎡a1a2a3⎦⎤=a1e1+a2e2+a3e3

  2. Inner and outer products of vectors

    • Inner product of vectors: Describes the projection relationship between vectors
      a ⋅ b = a T b = ∑ i = 1 3 aibi = ∣ a ∣ ∣ b ∣ cos ⁡ ⟨ a , b   a \cdot b = a^T b = \ sum_{i=1}^3 a_ib_i = |a|\,|b| \cos \langle a,b \rangle a⋅b=aTb=i=1∑3aibi=∣a∣∣b∣cos⟨a,b 〉

    • Foreign equation: Determine the square root
      a × b = [ ijka 1 a 2 a 3 b 1 b 2 b 3 ] = [ a 2 b 3 − a 3 b 2 a 3 b 1 − a 1 b 3 a 1 b 2 − a 2 b 1 ] = [ 0 − a 3 a 2 a 3 0 − a 1 − a 2 a 1 0 ] b ≜ a ∧ ba \times b = \left[\begin{array}{ccc} i &; j & k \\ a_1 & a_2 & a_3 \\ b_1 & b_2 & b_3 \ \ end { array } \ right ] = \ left [ \ begin { array } { c } a_2b_3 - a_3b_2 \ \ a_3b_1 - a_1b_3 \ \ a_1b_2 - a_2b_1 \end{array}\right] = \left[\begin{array}{ccc}0&-a_3&a_2\\a_3&0&-a_1\\-a_2&a_1&0\end{array}\; right] b \triangleq a ^\wedge ba×b=⎣⎡ia1b1ja2b2ka3b3⎦⎤=⎣⎡a2b3−a3b2a3b1−a1b3a1b2−a2b1⎦⎤=⎣⎡0a3−a2−a30a1a2−a10⎦⎤b≜a∧b

      For a ∧ a^\wedge a∧definition aaa integer wedge
      a ∧ = [ 0 − a 3 a 2 a 3 0 − a 1 − a 2 a 1 0 ] a ^\wedge = \left[\begin{array }{ccc} 0 & -a_3 & a_2\\ a_3 & 0 & -a_1 \\ -a_2 & a_1 & 0 \end{array}\right] a∧=⎣⎡0a3−a2−a30a1a2−a10⎦⎤

Euclidean transformation between coordinate systems

  1. Euclidean transformation:

    In the two coordinate systems before and after Euclidean transformation, the module length and direction of the same vector do not change, which is a Euclidean transformation.

    A Euclidean transformation consists of a rotation and a translation.

  2. Rotation matrix RRR:

    • Derivation of rotation matrix RRR:

      Assume that the unit orthonormal basis [ e 1 , e 2 , e 3 ] [e_1, e_2, e_3] [e1, e2, e3] becomes [ e 1 ′ , e 2 ′ , e 3 ′ ] [e_1 ', e_2', e_3'] [e1′, e2′, e3′], for the same vector aaa, the coordinates in the two coordinate systems are [ a 1 , a 2 , a 3 ] T [a_1, a_2 , a_3]^T [a1,a2,a3]T and [ a 1 ′ , a 2 ′ , a 3 ′ ] T [a_1', a_2', a_3']^T [a1′,a2′,a3′] T. According to the definition of coordinates:
      [ e 1 , e 2 , e 3 ] [ a 1 a 2 a 3 ] = [ e 1 ′ , e 2 ′ , e 3 ′ ] [ a 1 ′ a 2 ′ a 3 ′ ] [e_1, e_2, e_3] \left[\begin{array}{c} a_1 \\ a_2 \\ a_3 \end{array}\right] = [e_1', e_2', e_3'] \left[\begin{ array}{c} a_1' \\ a_2' \\ a_3' \end{array}\right] [e1,e2,e3]⎣⎡a1a2a3⎦⎤=[e1′,e2′,e3′]⎣⎡a1′ a2′a3′⎦⎤

      The infinitesimal conjugation of [ e 1 T , e 2 T , e 3 T ] T [e_1^T, e_2^T, e_3 T] T [e1T,e2T,e3T]T,denominator
      [ a 1 a 2 a 3 ] = [ e 1 T e 1 ′ e 1 T e 2 ′ e 1 T e 3 ′ e 2 T e 1 ′ e 2 T e 2 ′ e 2 T e 3 ′ e 3 T e 1 ′ e 3 T e 2 ′ e 3 T e 3 ′ ] [ a 1 ′ a 2 ′ a 3 ′ ] ≜ R a ′ \left[\begin{array}{c} a_1 \\ a_2 \\ a_3 \end{array}\right] = \left[\begin{array}{ccc} e_1^Te_1' & e_1^Te_2' & e_1^Te_3' \\ e_2^Te_1' & e_2^Te_2' & e_2^Te_3' \\ e_3^Te_1' & e_3 ^Te_2' & e_3^Te_3' \end{array}\right] \left[\begin{array}{c} a_1' \\ a_2' \\ a_3' \end{array}\right] \triangleq R a' ⎣⎡a1a2a3⎦⎤=⎣⎡a1′a2′a3′⎦⎤≜Ra′

      The matrix RRR describes the rotation and is called the rotation matrix.

    • Properties of rotation matrix RRR

      1. A rotation matrix is ​​an orthogonal matrix with determinant 1. Any orthogonal matrix with determinant 1 is also a rotation matrix. All rotation matrices form a special orthogonal group SO SO SO:

      S O ( n ) = { R ∈ R n × n ∣ R R T = I , det ⁡ ( R ) = 1 } SO(n) = \{ R \in \mathbb{R}^{n \times n} | RR^T = I, \det®=1 \} SO(n)={ R∈Rn×n∣RRT=I,det®=1}

      1. The rotation matrix is ​​an orthogonal matrix (its transpose is equal to its inverse), and the inverse of the rotation matrix R − 1 R^{-1} R−1 (i.e., transpose RTR^T RT) describes an opposite rotation.
  3. Vector representation of Euclidean transformation:

    The vector aaa in the world coordinate system, after one rotation (described by the rotation matrix RRR) and one translation (described by the translation vector ttt), a ′ a' a′ is obtained: a
    ′ = R a + t a' = Ra + ta′=Ra+t

Transformation matrix and homogeneous coordinates

  1. Transformation matrix TTT:

    Add 1 to the end of the three-dimensional vector to form a four-dimensional vector called homogeneous coordinates. Write the rotation and translation into the transformation matrix TTT to get:

    [ a ′ 1 ] = [ R t 0 1 ] [ a 1 ] ≜ T [ a 1 ] \left[\begin{array}{c} a' \\ 1 \end{array}\right] = \left[ \begin{array}{cc} R & t \\ 0 & 1 \end{array}\right] \left[\begin{array}{c} a \\ 1 \end{array}\right] \triangleq T \left[\begin{array}{c} a \\ 1 \end{array}\right] [a′1]=[R0t1][a1]≜T[a1]
    The significance of homogeneous coordinates is to express the Euclidean transformation is a linear relationship.

  2. Properties of transformation matrix TTT:

    1. The transformation matrix TTT forms the special Euclidean group SE SE SE
      SE (3) = { T = [ R t 0 1 ] ∈ R 4 × 4 ∣ R ∈ SO ( 3 ) , t ∈ R 3 } SE(3) = \left\ { T = \left[\begin{array}{cc} R & t \\ 0 & 1 \end{array}\right] \in \mathbb{R}^{4\times4} | R \in SO(3 ), t \in \mathbb{R}^3 \right\} SE(3)={ T=[R0t1]∈R4×4∣R∈SO(3),t∈R3}
    2. The inverse of the transformation matrix represents a reverse Euclidean transformation
      T − 1 = [ RT − RT t 0 1 ] T^{-1} = \left[\begin{array}{cc} R^T & -R^Tt \ \ 0 & 1 \end{array}\right] T−1=[RT0−RTt1]

Advantages of Homogeneous Coordinate

Advantage 1: Convenient to judge whether it is on a straight line or a plane

Give p = ( x , y ) p=(x,y) p=(x,y) and function l = ( a , b , c ) l=(a,b,c) l=(a,b, c) For example:
ax + by + c = [ a , b , c ] T ⋅ [ x , y , 1 ] = l T ⋅ p ′ = 0 ax+by+c = [a,b,c] ^T \cdot [x,y,1] = l^T \cdot p' = 0 ax+by+c=[a,b,c]T⋅[x,y,1]=lT⋅p′=0

若点 p = ( x , y , z ) p=(x,y,z) p=(x,y,z)在平面 A = ( a , b , c , d ) A=(a,b,c,d) A=(a,b,c,d)上,则有:
a x + b y + c z + d = [ a , b , c , d ] T ⋅ [ x , y , z , 1 ] = A T ⋅ p ′ = 0 ax+by+cz+d = [a,b,c,d]^T \cdot [x,y,z,1] = A^T \cdot p’ = 0 ax+by+cz+d=[a,b,c,d]T⋅[x,y,z,1]=AT⋅p′=0

Advantage 2: Conveniently express line intersection points and point-to-point collinearity

Under homogeneous coordinates,

  1. The cross product result of the homogeneous coordinates of two points ppp, qqq can be used to express their collinearity ll l.
  2. The intersection point xx x of the two straight lines lll, mmm can be expressed as the homogeneous coordinate cross product result.

Insert image description here

The property of cross product is used here: the cross product result is perpendicular to both operation vectors:

  • Proof of Property 1:
    l T ⋅ p = ( p × q ) ⋅ p = 0 l T ⋅ q = ( p × q ) ⋅ q = 0 l^T \cdot p = (p \times q) \cdot p = 0 \\ l^T \cdot q = (p \times q) \cdot q = 0 lT⋅p=(p×q)⋅p=0lT⋅q=(p×q)⋅q=0
  • 性质2的证明:
    l T ⋅ p = l T ⋅ ( l × m ) = 0 m T ⋅ p = m T ⋅ ( l × m ) = 0 l^T \cdot p = l^T \cdot (l \times m) = 0 \\ m^T \cdot p = m^T \cdot (l \times m) = 0 lT⋅p=lT⋅(l×m)=0mT⋅p=mT⋅(l×m)=0
Advantage 3: Able to distinguish vectors and points
  • The homogeneous coordinates of point (x, y, z) (x,y,z) (x,y,z) are (x, y, z, 1) (x,y,z,1) (x,y, z,1)
  • The homogeneous coordinates of the vector ( x , y , z ) (x,y,z) (x,y,z) are ( x , y , z , 0 ) (x,y,z,0) (x,y, z,0)
Advantage 4: Ability to express infinity point

For parallel straight lines l = ( a , b , c ) l = (a, b, c) l = (a, b, c) and m = ( a , b , d ) m = (a, b, d) m =(a,b,d), find the homogeneous coordinates of their intersection x = l × m = ( kb , − ka , 0 ) x=l \times m=(kb, -ka, 0) x=l× m=(kb,−ka,0), convert it to non-homogeneous coordinates, and get x = ( kb / 0 , − ka / 0 ) = ( inf ⁡ , − inf ⁡ ) x = (kb/0, - ka/0) = (\inf, -\inf) x=(kb/0,−ka/0)=(inf,−inf), which represents the infinity point.

Advantage 5: Ability to express transformations concisely

Using homogeneous coordinates, addition operations can be converted into multiplication operations.

transform form Graphical representation mathematical transformation MATLAB functions
Translation Insert image description here [ x ′ y ′ 1 ] = [ 1 0 t x 0 1 t y 0 0 1 ] ∗ [ x y 1 ] \left[\begin{array}{c} x' \\ y' \\ 1 \end{array}\right] =\left[\begin{array}{c} 1 & 0 & t_x \\ 0 & 1 & t_y \\ 0 & 0 & 1 \end{array}\right] * \left[\begin{array}{c} x \\ y \\ 1 \end{array}\right] ⎣⎡​x′y′1​⎦⎤​=⎣⎡​100​010​tx​ty​1​⎦⎤​∗⎣⎡​xy1​⎦⎤​ imtranslate()
Scale Insert image description here [ x ′ y ′ 1 ] = [ s x 0 0 0 s y 0 0 0 1 ] ∗ [ x y 1 ] \left[\begin{array}{c} x' \\ y' \\ 1 \end{array}\right] =\left[\begin{array}{c} s_x & 0 & 0 \\ 0 & s_y & 0 \\ 0 & 0 & 1 \end{array}\right] * \left[\begin{array}{c} x \\ y \\ 1 \end{array}\right] ⎣⎡​x′y′1​⎦⎤​=⎣⎡​sx​00​0sy​0​001​⎦⎤​∗⎣⎡​xy1​⎦⎤​ imresize()
Shear Insert image description here [ x ′ y ′ 1 ] = [ 1 h x 0 h y 1 0 0 0 1 ] ∗ [ x y 1 ] \left[\begin{array}{c} x' \\ y' \\ 1 \end{array}\right] =\left[\begin{array}{c} 1 & h_x & 0 \\ h_y & 1 & 0 \\ 0 & 0 & 1 \end{array}\right] * \left[\begin{array}{c} x \\ y \\ 1 \end{array}\right] ⎣⎡​x′y′1​⎦⎤​=⎣⎡​1hy​0​hx​10​001​⎦⎤​∗⎣⎡​xy1​⎦⎤​
Rotate Insert image description here [ x ′ ​​y ′ 1 ] = [ cos ⁡ θ sin ⁡ θ 0 − sin ⁡ θ cos ⁡ θ 0 0 0 1 ] ∗ [ xy 1 ] \left[\begin{array}{c} x' \\ y' \\ 1 \end{array}\right] =\left[\begin{array}{c} \cos\theta & \sin\theta & 0 \\ -\sin\theta & \cos\theta & 0 \\ 0 & 0 & 1 \end{array}\right] * \left[\begin{array}{c}x\\y\\1 \end{array}\right] ⎣⎡ x′y′1 ⎦ ⎤ =⎣⎡ cosθ−sinθ0 sinθcosθ0 001 ⎦⎤ ∗⎣⎡ xy1 ⎦⎤ imrotate()

Rotation vectors and Euler angles

rotation vector

  • Disadvantages of rotation matrices:

    1. The rotation matrix has 9 quantities, but a rotation only has 3 degrees of freedom. This expression is redundant.
    2. The rotation matrix has its own constraints (it must be an orthogonal matrix with a determinant of 1), and these constraints will cause difficulties in estimation and optimization.
  • Rotation vector: Any rotation can be characterized by an axis of rotation and an angle of rotation. Therefore, we can use a vector whose direction represents the axis of rotation and whose length represents the angle of rotation. This vector is called a rotation vector (or axis angle, Axis -Angle).

    Suppose there is a rotation with the rotation axis nnn and the angle θ \theta θ, and its corresponding rotation vector is θ n \theta n θn.

  • Conversion between rotation vector and rotation matrix:

    Let the rotation vector RRR represent a rotation around the unit vector nnn with an angle of θ θ θ.

    • Rotate vector to rotation matrix:
      R = cos ⁡ θ I + ( 1 − cos ⁡ θ ) nn T + sin ⁡ θ n ∧ R = \cos\theta I + (1-\cos\theta) nn^T + \sin \theta \, n^\wedge R=cosθI+(1−cosθ)nnT+sinθn∧

    • Rotation matrix to rotation vector:

      • Definition θ = arccos ⁡ ( tr ( R ) − 1 2 ) \theta = \arccos \left( \frac{tr®-1}{2} \right) θ=arccos(2tr®−1) .
      • The rotation axis nnn is the eigenvector corresponding to the matrix RRR eigenvalue 1

Euler angles

  • Euler angles decompose a rotation into 3 separate angles. A commonly used ZYX angle decomposes any rotation into angles on the following 3 axes:

    1. Rotate around the ZZZ axis of the object to obtain the yaw angle yaw
    2. Rotate around the YYY axis after rotation to get the pitch angle.
    3. Rotate around the XXX axis after rotation to get the roll angle roll
  • A major shortcoming of Euler angles is the universal lock problem (singularity problem): when the pitch angle is $\pm$90°, the first rotation and the third rotation will use the same axis, causing the system to lose a freedom degree (from 3 rotations to 2 rotations).

Quaternion

Why quaternions are needed: For three-dimensional rotations, there is no way to describe three-dimensional vectors without singularities. Therefore, quaternions are introduced. Quaternions
are an extended complex number that is both compact and has no singularities.

Definition of quaternions

  1. Definition of quaternions

    A quaternion qqq has one real part and three imaginary parts
    q = q 0 + q 1 i + q 2 j + q 3 kq = q_0 + q_1 i + q_2 j + q_3 kq=q0+q1i+q2j+q3k

    Among them, iii, jjj, kkk are the three imaginary parts of the quaternion. They satisfy the following relationship (the operation between oneself and oneself is like a complex number, and the operation between oneself and others is like cross multiplication): {
    i 2 = j 2 = k 2 = − 1 ij = k , ji = − kjk = i , kj = − iki = j , ik = − j \left\{ \begin{aligned} & i^2 = j^2 = k^2 = -1 \ \ & ij = k, ji=-k \\ & jk = i, kj=-i \\ & ki = j, ik=-j \end{aligned} \right. ⎩⎪⎪⎪⎪⎨⎪⎪⎪⎪ ⎧i2=j2=k2=−1ij=k,ji=−kjk=i,kj=−iki=j,ik=−j

    Quaternions can also be expressed as a scalar and a vector:
    q = [ s , v ] , s = q 0 ∈ R v = [ q 1 , q 2 , q 3 ] T ∈ R 3 q = [s, v ], \quad s=q_0\in\mathbb{R} \quad v=[q_1, q_2, q_3]^T \in \mathbb{R}^3 q=[s,v],s=q0∈Rv= [q1,q2,q3]T∈R3

    sss is the real part of the quaternion, vvv is the imaginary part of the quaternion. There are concepts of real quaternion and imaginary quaternion.

  2. The relationship between quaternion and rotation angle:

    • In the two-dimensional case, any rotation can be described by a unit complex number. Multiplying iii is a rotation of 90° around the iii axis.
    • In a three-dimensional case, any rotation can be described by unit quaternions. Multiplying iii means rotating 180° around the iii axis.
  3. Conversion between unit quaternion and rotation vector:

    Let the unit quaternion qqq represent a rotation around the unit vector n = [nx, ny, nz] T n =[n_x,n_y,n_z]^T n=[nx,ny,nz]T, with an angle of θ θ θ .

    • From rotation vector to unit quaternion:

    q = [ cos ⁡ ( θ 2 ) , n sin ⁡ ( θ 2 ) ] T = [ cos ⁡ ( θ 2 ) , nx sin ⁡ ( θ 2 ) , ny sin ⁡ ( θ 2 ) , nz sin ⁡ ( θ ) ] T q = \left[ \cos(\frac{\theta}{2}), n\sin(\frac{\theta}{2}) \right]^T= \left[ \cos(\frac {\theta}{2}), n_x\sin(\frac{\theta}{2}), n_y\sin(\frac{\theta}{2}), n_z\sin(\frac{\theta}{ 2}) \right]^T q=[cos(2θ),nsin(2θ)]T=[cos(2θ),nxsin(2θ),nysin(2θ),nzsin(2θ)]T

    • From unit quaternion to rotation vector:
      { θ = 2 arccos ⁡ q 0 [ nx , ny , nz ] = [ q 1 , q 2 , q 3 ] T / sin ⁡ θ 2 \left\{ \begin{aligned} & \theta = 2 \arccos{q_0}\\ & [n_x,n_y,n_z] = [q_1, q_2, q_3]^T / \sin{\frac{\theta}{2}} \end{aligned} \ right. ⎩⎨⎧θ=2arccosq0[nx,ny,nz]=[q1,q2,q3]T/sin2θ

Represent rotation using unit quaternions

Given a three-dimensional point p = [ x , y , z ] ∈ R 3 p=[x,y,z ]\in \R^3 p=[x,y,z]∈R3, and an axis angle nnn, θ θ θ specifies the rotation, and the three-dimensional point ppp becomes p ′ p′ p′ after rotation. How to use unit quaternion qqq to express rotation?

  1. Represent the three-dimensional space point with an imaginary quaternion ppp:
    p = [0, x, y, z] = [0, v] p = [0, x, y, z] = [0, v] p=[ 0,x,y,z]=[0,v]
  2. Represent rotation as unit quaternion qqq:
    q = [ cos ⁡ θ 2 , n sin ⁡ θ 2 ] q = [\cos{\frac{\theta}{2}}, n\sin{\frac{\theta }{2}} ] q=[cos2θ,nsin2θ]
  3. The rotated point p ′ p' p′ can be expressed as:
    p ′ = qpq − 1 p' = qpq^{-1} p′=qpq−1

The point p ′ p' p′ obtained in this way is still a pure imaginary quaternion, and the three components of its imaginary part represent the coordinates of the 3D point after rotation.

Only unit quaternions can represent rotation, so after creating a quaternion in the program, remember to call it normalize()to unitize it.

ch04 Lie groups and Lie algebras

Basics of Lie groups and Lie algebra

The rotation matrix forms the special orthogonal group SO (3) SO(3) SO(3), and the transformation matrix forms the special Euclidean group SE (3) SE(3) SE(3).
SO (3) = { R ∈ R 3 × 3 ∣ RRT = I , det ⁡ ( R ) = 1 } SE ( 3 ) = { T = [ R t 0 T 1 ] ∈ R 4 × 4 ∣ R ∈ SO ( 3 ) , t ∈ R 3 } \ begin{aligned} SO(3) &= \left\{ R \in \mathbb{R}^{3\times 3} | RR^T=I, \det®=1 \right\} \\ SE(3 ) &= \left\{ T = \left[\begin{array}{cc} R & t \\ 0^T & 1 \end{array}\right] \in \mathbb{R}^{4\times 4} | R \in SO(3), t \in \mathbb{R}^3 \right\} \end{aligned} SO(3)SE(3)={ R∈R3×3∣RRT=I, det®=1}={ T=[R0Tt1]∈R4×4∣R∈SO(3),t∈R3}

group definition

  • A group is an algebraic structure that is a set plus an operation. Let the set be recorded as AAA and the operation as ⋅ \cdot ⋅ , then the group can be recorded as G = ( A , ⋅ ) G = (A,\cdot ) G=(A,⋅). The group requires that this operation satisfies the following conditions (sealed unitary inverse):

    1. 封闭性: ∀ a 1 , a 2 ∈ A , a 1 ⋅ a 2 ∈ A \forall{a_1, a_2} \in A, \quad a_1 \cdot a_2 \in A ∀a1,a2∈A,a1⋅a2∈A.
    2. 结合律: ∀ a 1 , a 2 , a 3 ∈ A , ( a 1 ⋅ a 2 ) ⋅ a 3 = a 1 ⋅ ( a 2 ⋅ a 3 ) \forall{a_1, a_2, a_3} \in A, \quad (a_1 \cdot a_2) \cdot a_3 = a_1 \cdot (a_2 \cdot a_3 ) ∀a1,a2,a3∈A,(a1⋅a2)⋅a3=a1⋅(a2⋅a3)
    3. 幺元: ∃ a 0 ∈ A , s . t . ∀ a ∈ A , a 0 ⋅ a = a ⋅ a 0 = a \exists{a_0} \in A, \quad \mathrm{s.t.} \quad \forall a \in A, \quad a_0\cdot a = a\cdot a_0 = a ∃a0∈A,s.t.∀a∈A,a0⋅a=a⋅a0=a
    4. 逆: ∀ a ∈ A , ∃ a − 1 ∈ A , s . t . a ⋅ a − 1 = a 0 \forall a \in A, \quad \exists{a^{-1}} \in A, \quad \mathrm{s.t.} a\cdot a^{-1}=a_0 ∀a∈A,∃a−1∈A,s.t.a⋅a−1=a0
  • Lie group refers to a group with continuous (smooth) properties. SO (3) SO(3) SO(3) and SE (3) SE(3) SE(3) are both Lie groups

Definition of Lie algebra

Each Lie group has a corresponding Lie algebra, which describes the local properties of the Lie group.

The definition of a general Lie algebra is as follows:
Lie algebra consists of a set VVV, a number field FFF and a binary operation [, ] [, ] [,]. If they satisfy the following properties, they are called ( V , F , [ , ] ) (V, F, [, ]) (V,F,[,]) is a Lie algebra, denoted as g \mathfrak{g} g.

  1. 封闭性: ∀ X , Y ∈ V , [ X , Y ] ∈ V \forall{X, Y} \in V, [X,Y] \in V ∀X,Y∈V,[X,Y]∈V.
  2. 双线性: $\forall X,Y,Z \in V, a,b \in F $有:
    [ a X + b Y , Z ] = a [ X , Z ] + b [ Y , Z ] , [ Z , a X + b Y ] = a [ Z , X ] + b [ Z , Y ] [aX+bY,Z]=a[X,Z]+b[Y,Z], \quad [Z, aX+bY]=a[Z,X]+b[Z,Y] [aX+bY,Z]=a[X,Z]+b[Y,Z],[Z,aX+bY]=a[Z,X]+b[Z,Y]
  3. Reflexivity: ∀
  4. 雅可比等价 ∀ X , Y , Z ∈ V , [ X , [ Y , Z ] ] + [ Z , [ X , Y ] ] + [ Y , [ Z , X ] ] = 0 \forall X,Y,Z \in V, \quad [X, [Y,Z ]]+[Z, [X,Y ]]+[Y, [Z,X ]]=0 ∀X,Y,Z∈V,[X,[Y,Z]]+[Z,[X,Y]]+[Y,[Z,X]]=0.

The binary operations [ , ] [,] [,] are called Lie brackets. For example, the cross product × \times × defined on the three-dimensional vector space R 3 \mathbb{R^3} R3 is a kind of Lie bracket.

Lie algebra so (3) \mathfrak{so}(3) so(3)

  • The Lie algebra corresponding to Lie group SO ( 3 ) SO(3) SO(3) so ( 3 ) \mathfrak{so}(3) so(3) is a vector defined on R 3 \mathbb{R^3} R3 , recorded as ϕ \phi ϕ.

    so ( 3 ) = { ϕ ∈ R 3 , Φ = ϕ ∧ = [ 0 − ϕ 3 ϕ 2 ϕ 3 0 − ϕ 1 − ϕ 2 ϕ 1 0 ] ∈ R 3 × 3 } \mathfrak{so}(3) = \left\{ \phi \in \mathbb{R^3}, \Phi=\phi^\wedge = \left[\begin{array}{ccc} 0 & -\phi_3 & \phi_2\\ \phi_3 & 0 & -\phi_1 \\ -\phi_2 & \phi_1 & 0 \end{array}\right] \in \mathbb{R^{3 \times 3}} \right\} so(3)=⎩⎨⎧ϕ ∈R3,Φ=ϕ∧=⎣⎡0ϕ3−ϕ2−ϕ30ϕ1ϕ2−ϕ10⎦⎤∈R3×3⎭⎬⎫

  • Lie algebra so ( 3 ) \mathfrak{so}(3) The Lie bracket of so(3) is
    [ ϕ 1 , ϕ 2 ] = ( Φ 1 Φ 2 − Φ 2 Φ 1 ) ∨ [\phi_1, \phi_2] = (\Phi_1 \Phi_2 - \Phi_2 \Phi_1) ^\vee [ϕ1,ϕ2]=(Φ1Φ2−Φ2Φ1)∨where
    ∨ ^\vee ∨ is the inverse operation of ∧ ^\wedge ∧, which means reducing the antisymmetric matrix to a vector

  • so ( 3 ) \mathfrak{so}(3) The mapping relationship between so(3) and SO ( 3 ) SO(3) SO(3) is

    Lie group R = exp ⁡ ( ϕ ∧ ) = exp ⁡ ( Φ ) Lie algebra ϕ = ln ⁡ ( R ) ∨ \begin{aligned} Lie group R &= \exp(\phi ^\wedge) = \exp (\ Phi) \\ Lie algebra\phi &= \ln ® ^\vee \end{aligned} Lie group R Lie algebra ϕ=exp(ϕ∧)=exp(Φ)=ln®∨

Lie algebra se ( 3 ) \mathfrak{se}(3) se(3)

  • Similarly, the Lie algebra se ( 3 ) \mathfrak{se}(3) se(3) of the Lie group SE ( 3 ) SE(3) SE(3) is defined on R 6 \mathbb{R^6} R6 The vector on is denoted as ξ \xi ξ:

    se ( 3 ) = { ξ = [ ρ ϕ ] ∈ R 6 , ρ ∈ R 3 , ϕ ∈ so ( 3 ) , ξ ∧ = [ ϕ ∧ ρ 0 T 0 ] ∈ R 4 × 4 } \mathfrak{se} (3) = \left\{ \xi = \left[\begin{array}{c} \rho \\ \phi\end{array}\right] \in \mathbb{R^6}, \rho \in \mathbb{R^3}, \phi \in \mathfrak{so}(3), \xi^\wedge = \left[\begin{array}{cc} \phi^\wedge & \rho \\ 0^ T & 0\end{array}\right] \in \mathbb{R^{4\times 4}} \right\} se(3)={ ξ=[ρϕ]∈R6,ρ∈R3,ϕ∈so (3), ξ∧=[ϕ∧0Tρ0]∈R4×4}
    se (3) \mathfrak{se}(3) Each element ξ \xi ξ in se(3) is a six-dimensional vector. Before The three-dimensional ρ \rho ρ represents translation; the latter three-dimensional ϕ \phi ϕ represents rotation, which is essentially the so (3) \mathfrak{so}(3) so(3) element.

  • The ∧ ^\wedge ∧ symbol is also used here to expand the six-dimensional vector into a four-dimensional matrix, but it no longer represents antisymmetry.

    ξ ∧ = [ ϕ ∧ ρ 0 T 0 ] ∈ R 4 × 4 \xi^\wedge = \left[\begin{array}{cc} \phi^\wedge & \rho \\ 0^T & 0\end{array}\right] \in \mathbb{R^{4 \times 4}} ξ∧=[ϕ∧0Tρ0]∈R4×4

  • Lie algebra se ( 3 ) \mathfrak{se}(3) The Lie brackets of se(3) are similar to so ( 3 ) \mathfrak{so}(3) so(3):
    [ ξ 1 , ξ 2 ] = ( ξ 1 ∧ ξ 2 ∧ − ξ 2 ∧ ξ 1 ∧ ) ∨ [\xi_1, \xi_2] = (\xi^\wedge_1 \xi^\wedge_2 - \xi^\wedge_2 \xi^\wedge_1) ^\vee [ξ1 ,ξ2]=(ξ1∧ξ2∧−ξ2∧ξ1∧)∨

  • se ( 3 ) \mathfrak{se}(3) The mapping relationship between se (3) and SE ( 3 ) SE(3) SE(3) is Lie
    group T = exp ⁡ ( ξ ∧ ) Lie algebra ξ = ln ⁡ ( T ) ∨ \begin{aligned} Lie group T &= \exp(\xi ^\wedge) \\ Lie algebra\xi &= \ln (T) ^\vee \end{aligned} Lie group T Lie algebraξ= exp(ξ∧)=ln(T)∨

Conversion relationship between Lie groups and Lie algebras: exponential mapping and logarithmic mapping

SO ( 3 ) SO(3) Conversion relationship between SO (3) and so ( 3 ) \mathfrak{so}(3) so(3)

  • Decompose the three-dimensional vector ϕ \phi ϕ into its module length θ \theta θ and direction vector α \alpha α, that is, ϕ = θ α \phi=\theta\alpha ϕ=θα. Then from so (3) \mathfrak{so }(3) The exponential mapping from so(3) to SO ( 3 ) SO(3) SO(3) can be expressed as:

    R = exp ⁡ ( ϕ ) = exp ⁡ ( θ α ∧ ) = cos ⁡ θ I + ( 1 − cos ⁡ θ ) α α T + sin ⁡ θ α ∧ R = \exp(\phi) = \exp(\ theta \alpha ^\wedge) = \cos \theta I + (1-\cos\theta) \alpha \alpha^T + \sin \theta \alpha ^\wedge R=exp(ϕ)=exp(θ∧) =cosθI+(1−cosθ)ααT+sinθα∧

    The above formula is the Rodriguez formula from rotation vector to rotation matrix. It can be seen that ** so ( 3 ) \mathfrak{so}(3) so(3) is essentially a space composed of rotation vectors **.

  • The logarithmic mapping from SO ( 3 ) SO(3) SO(3) to so ( 3 ) \mathfrak{so}(3) so(3) can be expressed as:
    ϕ = ln ⁡ ( R ) ∨ \phi = \ ln®^\vee ϕ=ln®∨

    In actual calculations, the rotation angle θ \theta θ and the rotation axis α \alpha α
    θ = arccos ⁡ tr ( R ) − 1 2 , R α = α \theta = \arccos \frac{tr®-1 }{2}, \qquad R \alpha = \alpha θ=arccos2tr®−1,Rα=α

SE (3) SE(3) Conversion relationship between SE(3) and se (3) \mathfrak{se}(3) se(3)

  • The exponential mapping from se ( 3 ) \mathfrak{se}(3) se(3) to SE ( 3 ) SE(3) SE(3) can be expressed as:

    T = exp ⁡ ( ξ ∧ ) = [ R J ρ 0 T 1 ] T = \exp(\xi ^\wedge) = \left[\begin{array}{cc} R & J\rho \\ 0^T & 1\end{array}\right] T=exp(ξ∧)=[R0TJρ1]

    Level
    J = sin ⁡ θ θ I + ( 1 − sin ⁡ θ θ ) α α T + 1 − cos ⁡ θ θ α ∧ J = \frac{\sin\theta}{\theta} I + (1-\frac {\sin\theta}{\theta}) \alpha \alpha^T + \frac{1- \cos\theta}{\theta} \alpha^\wedge J=θsinθI+(1−θsinθ)ααT+θ1−cosθα ∧

    It can be seen that after the translation part undergoes exponential mapping, a linear transformation takes place with JJJ as the coefficient matrix.

  • The logarithmic mapping from SE ( 3 ) SE(3) SE(3) to se ( 3 ) \mathfrak{se}(3) se(3) can be expressed as:
    ξ = ln ⁡ ( T ) ∨ \xi = \ ln(T)^\vee ξ=ln(T)∨

    In actual calculation, ϕ \phi ϕ can be obtained by mapping SO ( 3 ) SO(3) SO(3) to so ( 3 ) \mathfrak{so}(3) so(3), and ρ \rho ρ can be obtained by t = J ρ t=J\rho t=Jρ is calculated.

Insert image description here

Derivation of Lie algebra: A major motivation for introducing Lie algebra is to facilitate derivation optimization

The relationship between Lie group multiplication and Lie algebra addition

  1. BCH formula and its approximate form

    • Unfortunately, Lie group products and Lie algebra addition are not equivalent, that is:
      R 1 R 2 = exp ⁡ ( ϕ 1 ∧ ) exp ⁡ ( ϕ 1 ∧ ) ≠ exp ⁡ ( ( ϕ 1 + ϕ 2 ) ∧ ) R_1 R_2 = \exp(\phi_1^\wedge) \exp(\phi_1^\wedge) \ne \exp((\phi_1 + \phi_2)^\wedge) R1R2=exp(ϕ1∧)exp(ϕ1∧) =exp((ϕ1+ϕ2)∧)

      The correspondence between Lie group products and Lie algebra operations is given by the BCH formula:

      ln ⁡ ( exp ⁡ ( A ) exp ⁡ ( B ) ) = A + B + 1 2 [ A , B ] + 1 12 [ A , [ A , B ] ] − 1 12 [ B , [ A , B ] ] + . . . \ln(\exp(A) \exp(B)) = A+B +\frac{1}{2} [A,B] +\frac{1}{12} [A, [A,B]] -\frac{1}{12} [B, [A,B]] + … ln(exp(A)exp(B))=A+B+21[A,B]+121[A,[A,B]]−121[B,[A,B]]+…

      In the above formula [ , ] [,] [,] represents Lie bracket operation.

    • When ϕ 1 \phi_1 ϕ1 or ϕ 2 \phi_2 ϕ2 is a small quantity, the BCH formula can be linearly approximated and the expression of the Lie algebra corresponding to the Lie group product is obtained:
      R 1 ⋅ R 2 The corresponding Lie algebra = ln ⁡ ( exp ⁡ ( ϕ 1 ∧ ) exp ⁡ ( ϕ 1 ∧ ) ) ∨ ≈ { J l ( ϕ 2 ) − 1 ϕ 1 + ϕ 2 When ϕ 1 is a small amount J r ( ϕ 1 ) − 1 ϕ 2 + ϕ 1 When ϕ 2 is a small quantity, the Lie algebra corresponding to R_1 \cdot R_2 = \ln (\exp(\phi_1^\wedge) \exp(\phi_1 \wedge)) \vee \approx \left\{ \begin{aligned } J_l(\phi_2)^{-1} \phi_1 + \phi_2 \quad \text{When phi _ 1 \\phi\_1When p hi _1 is a small amount} \\ J_r(\phi_1)^{-1} \phi_2 + \phi_1 \quad \text{When phi _ 2 \\phi\_2When p hi _2 is a small amount} \end{aligned} \right. The Lie algebra corresponding to R1⋅R2=ln(exp(ϕ1∧)exp(ϕ1∧))∨≈{ Jl(ϕ2)−1ϕ1+ϕ2 when ϕ1 When it is a small amount Jr(ϕ1)−1ϕ2+ϕ1 when ϕ2 is a small amount

      For each of the functions of the matrix J l J_l Jl for SE ( 3 ) SE(3) SE(3) and se ( 3 ) \mathfrak{se}(3) se(3) in the form of a matrix
      J l = sin ⁡ θ θ I + ( 1 − sin ⁡ θ θ ) α α T + 1 − cos ⁡ θ θ α ∧ J_l = \frac{\sin\theta}{\theta} I + (1-\frac {\sin\theta}{\theta}) \alpha \alpha^T + \frac{1- \cos\theta}{\theta} \alpha^\wedge Jl=θsinθI+(1−θsinθ)ααT+θ1−cosθα ∧

      Infinite
      J l − 1 = θ 2 cot ⁡ θ 2 I + ( 1 − θ 2 cot ⁡ θ 2 ) α α T + θ 2 α ∧ J_l^{-1} = \frac{\theta}{2} \cot{\frac{\theta}{2}} I + (1-\frac{\theta}{2} \cot{\frac{\theta}{2}}) \alpha \alpha^T + \frac {\theta}{2} \alpha^\wedge Jl−1=2θcot2θI+(1−2θcot2θ)ααT+2θα∧

      To right multiply the Jacobian matrix, you only need to take the negative sign of the independent variable
      J r ( ϕ ) = J l ( − ϕ ) J_r(\phi) = J_l(-\phi) Jr(ϕ)=Jl(−ϕ)

  2. The relationship between Lie group SO (3) SO(3) SO(3) multiplication and Lie algebra so (3) \mathfrak{so}(3) so(3) addition:

    • By left-multiplying the rotation RRR (the Lie algebra is ϕ \phi ϕ) by a small rotation Δ R \Delta R ΔR (the Lie algebra is Δ ϕ \Delta \phi Δϕ), we get the rotated Lie group Δ R ⋅ R \Delta R\cdot The Lie algebra corresponding to R ΔR⋅R is:
      Δ R ⋅ The Lie algebra corresponding to R = ln ⁡ ( exp ⁡ ( Δ ϕ ∧ ) exp ⁡ ( ϕ ∧ ) ) = ϕ + J l − 1 ( ϕ ) Δ ϕ \Delta The Lie algebra corresponding to R \cdot R = \ln \left( \exp(\Delta \phi^\wedge) \exp(\phi^\wedge) \right) = \phi + J_l^{-1}(\phi )\Delta \phi The Lie algebra corresponding to ΔR⋅R=ln(exp(Δϕ∧)exp(ϕ∧))=ϕ+Jl−1(ϕ)Δϕ
    • On the contrary, the Lie group element corresponding to Lie algebraic addition ( ϕ + Δ ϕ ) (\phi+\Delta \phi) (ϕ+Δϕ) can be expressed as:
      ( ϕ + Δ ϕ ) The corresponding Lie group = exp ⁡ ( ( ϕ + Corresponding to Lie group = \exp((\phi+\Delta \phi)^\wedge) = \exp((J_l \Delta \phi)^\wedge) \exp(\phi^\wedge)= \exp(\phi^ \wedge) \exp((J_r \Delta \phi)^\wedge) The corresponding Lie group of (ϕ+Δϕ)=exp((ϕ+Δϕ)∧)=exp((JlΔϕ)∧)exp(ϕ∧)= exp(ϕ∧)exp((JrΔϕ)∧)
  3. 同理,李群 SE ( 3 ) SE(3) SE(3)乘法与李代数 se ( 3 ) \mathfrak{se}(3) se(3)加法 relationship: exp
    ⁡ ( Δ ξ ∧ ) exp ⁡ ( ξ ∧ ) ≈ exp ⁡ ( ( J l − 1 Δ ξ + ξ ) ∧ ) exp ⁡ ( ξ ∧ ) exp ⁡ ( Δ ξ ∧ ) ≈ exp ⁡ ( ( J r − 1 Δ ξ + ξ ) ∧ ) \ exp(\Delta \xi^\wedge) \exp(\xi^\wedge) \approx \exp\left( (J_l^{-1}\Delta \xi + \xi)^\wedge \right) \\ \ exp(\xi^\wedge) \exp(\Delta \xi^\wedge) \approx \exp\left( (J_r^{-1}\Delta \xi + \xi)^\wedge \right) exp(Δξ ∧)exp(ξ∧)≈exp((Jl−1Δξ+ξ)∧)exp(ξ∧)exp(Δξ∧)≈exp((Jr−1Δξ+ξ)∧)

SO ( 3 ) SO(3) Lie algebra derivation on SO(3)

Rotate the space point ppp to obtain R p Rp Rp. The derivative of the coordinates of the point after rotation with respect to the rotation can be expressed as:
∂ ( R p ) ∂ R \frac{\partial(Rp)}{\partial R} ∂R∂ (RP)

There are two ways to derive the derivative of the above formula:

  1. Use the Lie algebra ϕ \phi ϕ to represent the attitude RRR, and then derive the derivative of ϕ \phi ϕ according to the Lie algebra addition.
  2. Use the Lie algebra φ \varphi φ to represent the small disturbance ∂ R \partial R ∂R, and then derive the derivative of φ \varphi φ according to the left multiplication of the Lie group.

Among them, the expression of the disturbance model is simple and more practical.

Lie algebra derivation

Use Lie algebra ϕ \phi ϕ to represent attitude RRR, and derive
∂ ( R p ) ∂ R = ∂ ( exp ⁡ ( ϕ ∧ ) p ) ∂ ϕ = − ( R p ) ∧ J l \frac{\partial(Rp )}{\partial R} = \frac{\partial( \exp(\phi^\wedge) p)}{\partial \phi} = -(Rp) ^\wedge J_l ∂R∂(Rp)=∂ϕ ∂(exp(ϕ∧)p)=−(Rp)∧Jl

Disturbance model (left multiplication)

Another way to derive the derivation is to perform a left multiplication perturbation ∂ R \partial R ∂R on RRR. Let the Lie algebra corresponding to the left multiplication perturbation ∂ R \partial R ∂R be φ \varphi φ, and derive the derivative of φ \varphi φ. , get
∂ ( R p ) ∂ R = exp ⁡ ( ( ϕ + φ ) ∧ ) p − exp ⁡ ( ϕ ∧ ) p φ = − ( R p ) ∧ \frac{\partial(Rp)}{\partial R } = \frac{ \exp((\phi+\varphi)^\wedge)p - \exp(\phi^\wedge)p }{\varphi} =-(Rp) ^\wedge ∂R∂(Rp)= φexp((φ+φ)∧)p−exp(φ∧)p=−(Rp)∧

SE ( 3 ) SE(3) Lie algebra derivation on SE(3)

Similarly, the space point ppp is transformed by TTT to obtain T p Tp Tp. TTT is left multiplied by a perturbation Δ T = exp ⁡ ( δ ξ ∧ ) \Delta T = \exp (\delta \xi ^\wedge) ΔT=exp( δξ∧), then we have
∂ ( R p ) δ ξ = [ I − ( R p + t ) ∧ 0 T 0 T ] = ( TP ) ⊙ \frac{\partial(Rp)}{\delta \xi} = \left[\begin{array}{cc} I & -(Rp+t)^\wedge \\ 0^T & 0^T\end{array}\right]= (TP) ^ \odot δξ∂(Rp )=[I0T−(Rp+t)∧0T]=(TP)⊙

ch05 Camera and Image

pinhole camera model

Insert image description here

O − x − y − z Oxyz O−x−y−z is the camera coordinate system, and the camera coordinate of the real space point PPP is [ X , Y , Z ] T [X,Y,Z]^T [X,Y, Z]T, projected onto the point P ′ P' P′ on the O'-x'-y' O'-x'-y' O'-x'-y' plane, the coordinates are [ X ′ , Y ′ , Z ′ ] T [X',Y',Z']^T [X′,Y′,Z′]T.

  • Symmetry the imaging plane to the front of the camera, and according to the geometric similarity relationship Z f = XX ′ = YY ′ \frac{Z}{f} = \frac{X}{X'} = \frac{Y}{Y'} fZ= X′X=Y′Y, sort out the coordinates of the projection point P ′ P' P′ on the projection plane P ′ = [ ,Y′]:

    { X ′ = f X Z Y ′ = f Y Z \left\{ \begin{aligned} X’ = f \frac{X}{Z} \\ Y’ = f \frac{Y}{Z} \\ \end{aligned} \right. ⎩⎪⎪⎨⎪⎪⎧X′=fZXY′=fZY

  • Convert to obtain the pixel coordinate P u of the projected point P ′ P' P′ on the pixel plane, v = [ u , v ] T P_{u,v} = [u, v]^T Pu,v=[u,v ]T
    { u = α X ′ + cx = fx XZ + cxv = β Y ′ + cy = fx XZ + cx \left\{ \begin{aligned} u = \alpha }{Z}+c_x \\ v = \beta Y' + c_y &= f_x \frac{X}{Z}+c_x \\ \end{aligned} \right. ⎩⎪⎪⎨⎪⎪⎧u=αX′ +cxv=βY′+cy=fxZX+cx=fxZX+cx

    In the above formula, the units of uuu, vvv, cx c_x cx, cy c_y cy, fx f_x fx, fy f_y fy are pixels, and the units of α \alpha α, β \beta β are pixels/meter.

  • Write the above formula in matrix form to get the relationship between the real space point camera coordinate PPP and the projection point pixel coordinate P uv P_{uv} Puv**:
    ZP uv = Z [ uv 1 ] = [ fx 0 cx 0 fycy 0 0 1 ] [ XYZ ] ≜ KPZ P_{uv} = Z \left[\begin{array}{c} u \\ v \\ 1 \end{array}\right] = \left[\begin{array} {ccc} f_x &0 &c_x \\ 0 &f_y &c_y \\ 0 &0 &1 \end{array}\right] \left[\begin{array}{c} X \\ Y \\ Z \end{array}\right] \triangleq KP ZPuv=Z⎣⎡uv1⎦⎤=⎣⎡fx000fy0cxcy1⎦⎤⎣⎡XYZ⎦⎤≜KP

    The matrix KKK is called the intrinsic parameter matrix of the camera.

  • PPP in the above formula is the camera coordinate of the real space point in the camera coordinate system. Convert it to the world coordinate PW P_W PW, we have

    Z P u v = K ( R P W + t ) = K T P W ZP_{uv} = K(RP_W+t)= KTP_W ZPuv=K(RPW+t)=KTPW

    Therefore, RRR, ttt (or TTT) are also called the external parameters of the camera.

  • Normalize the last dimension to obtain the normalized coordinates of point PPP on the normalized plane P c = [ X / Z , Y / Z , 1 ] T P_c=[X/Z, Y/Z, 1 ]^T Pc=[X/Z,Y/Z,1]T

    P c = P Z = K − 1 P u v P_c = \frac{P}{Z}={K^{-1} P_{uv}} Pc=ZP=K−1Puv

Insert image description here

The parameter matrix has internal parameters KKK and external parameters RRR, ttt, where:

  1. The internal parameter matrix KKK reflects the transformation from normalized camera coordinates to pixel coordinates.

    The reason why they are normalized coordinates reflects the nature of projection: a spatial point on a certain straight line will eventually be projected onto the same pixel point.

  2. The external parameter matrix RRR, ttt (or TTT) reflects the transformation from world coordinates to camera coordinates.

Distortion model

There are two types of distortion: radial distortion and tangential distortion.

  • Radial distortion: caused by the shape of the lens, mainly including barrel distortion and pincushion distortion.

    It can be seen that the coordinate point has changed along the length direction, that is, its length from the origin has changed.
    xdistorted = x ( 1 + k 1 r 2 + k 2 r 4 + k 3 r 6 ) ydistorted = y ( 1 + k 1 r 2 + k 2 r 4 + k 3 r 6 ) x_{distorted} = x(1+ k_1r^2 + k_2r^4 + k_3r6) \\ y_{distorted} = y(1+ k_1r^2 + k_2r^4 + k_3r6) xdistorted=x(1+k1r2+k2r4+k3r6)ydistorted=y(1+k1r2+k2r4+k3r6)

  • Tangential distortion: caused by the lens and imaging plane not being strictly parallel.

    It can be seen that the coordinate point has changed along the tangent direction, that is, the horizontal angle has changed.
    xdistorted = x + 2 p 1 xy + p 2 ( r 2 + 2 x 2 ) ydistorted = y + p 1 ( r 2 + 2 y 2 ) + 2 p 2 xy x_{distorted} = x + 2p_1xy + p_2(r 2+2x 2) \\ y_{distorted} = y + p_1(r 2+2y 2) + 2p_2xy xdistorted=x+ 2p1xy+p2(r2+2x2)ydistorted=y+p1(r2+2y2)+2p2xy

Imaging process of monocular camera

Imaging process of monocular camera:

  1. There is a fixed origin PPP in the world coordinate system, and its world coordinates are PW P_W PW
  2. Since the camera is moving, its movement is described by RRR, ttt or the transformation matrix T ∈ SE (3) T\in SE(3) T∈SE(3). The camera coordinate of the origin PPP P c ~ = RPW + t \tilde {P_c}=RP_W+t Pc~=RPW+t
  3. At this time, the components of P c ~ \tilde{P_c} Pc~ are XXX, YYY, ZZZ. Project them onto the normalized plane Z = 1 Z =1 Z=1 to obtain the normalized camera coordinate P c of PPP. = P c ~ Z = [ XZ , YZ , 1 ] T P_c =\frac{\tilde{P_c}}{Z}=[\frac{X}{Z},\frac{Y}{Z}, 1] ^T Pc=ZPc~=[ZX,ZY,1]T
  4. When there is distortion, calculate the normalized camera coordinates of P c P_c Pc according to the distortion parameters.
  5. After passing through the internal parameter KKK, the normalized camera coordinates of PPP P c P_c Pc correspond to its pixel coordinates P uv = KP c P_{uv}=KP_c Puv=KPc

When discussing the camera imaging model, we talked about four types of coordinates: world coordinates, camera coordinates, normalized camera coordinates and pixel coordinates. Readers are asked to clarify their relationship, which reflects the entire imaging process.

ch06 nonlinear optimization

state estimation problem

Maximum a posteriori and maximum likelihood

The SLAM model consists of state equations and motion equations:
{ xk = f ( xk − 1 , uk , wk ) zk , j = h ( yj , xk , vk , j ) \left\{ \begin{aligned} x_k &= f (x_{k-1}, u_k, w_k) \\ z_{k,j} &= h(y_j, x_k, v_{k,j}) \end{aligned} \right. { xkzk,j=f( xk−1,uk,wk)=h(yj,xk,vk,j)

It is usually assumed that the two noise terms wk w_k wk, vk , j v_{k,j} vk,j satisfy the Gaussian distribution with zero mean:
wk ∼ N ( 0 , R k ) , vk , j ∼ N ( 0 , Q k , j ) w_k \sim \mathcal{N}(0, R_k) ,\; v_{k,j} \sim \mathcal{N}(0, Q_{k,j}) wk∼N(0,Rk), vk,j∼N(0,Qk,j)

The estimation of the robot is essentially to find the conditional probability distribution of the robot's pose xxx and landmark point yyy under the known input data uuu and observation data zzz:
P ( x , y ∣ z , u ) P(x,y |z,u) P(x,y∣z,u)

利用贝叶斯法则,有:
P ( x , y ∣ z , u ) = P ( z , u ∣ x , y ) P ( x , y ) P ( z , u ) ∝ P ( z , u ∣ x , y ) P ( x , y ) P(x,y|z,u) = \frac{P(z,u|x,y) P(x,y)}{P(z,u)} \propto P(z,u|x,y) P(x,y) P(x,y∣z,u)=P(z,u)P(z,u∣x,y)P(x,y)∝P(z,u∣x,y)P(x,y)

where P ( x , y ∣ z , u ) P(x,y|z,u) P(x,y∣z,u) is the posterior probability, P ( z , u ∣ x , y ) P(z, u|x,y) P(z,u|x,y) is the likelihood, P ( x , y ) P(x,y) P(x,y) is the prior, the above formula can be expressed as the posterior probability ∝ Likelihood ⋅ Prior posterior probability \propto Likelihood \cdot Prior posterior probability ∝ Likelihood ⋅ Prior. It is difficult to directly find the posterior distribution, but it is difficult to find the optimal estimate of a state such that in this state the posterior distribution It is feasible to maximize the empirical probability:
( x , y ) MAP ∗ = arg ⁡ max ⁡ P ( x , y ∣ z , u ) = arg ⁡ max ⁡ P ( z , u ∣ x , y ) P ( x , y ) (x,y)^*_{MAP} = \arg \max P(x,y | z,u) = \arg \max P(z,u|x,y) P(x,y) ( x,y)MAP∗=argmaxP(x,y∣z,u)=argmaxP(z,u∣x,y)P(x,y)

Solving for the maximum posterior probability is equivalent to the product of maximizing the likelihood and the prior. Because xxx, yyy are unknown, that is, the prior is not known, the maximum likelihood estimate can be found:
( x , y ) MLE ∗ = arg ⁡ max ⁡ P ( z , u ∣ x , y ) (x,y)^*_{MLE} = \arg \max P(z,u|x,y) (x,y)MLE∗=argmaxP(z,u∣x ,y)

The intuitive meaning of maximum likelihood estimation is: under what state is it most likely to produce the currently observed data.

least squares

Least squares based on observation data zzz

For a certain observation
zk, j = h (yj, xk) + vk, j z_{k,j} = h(y_j, x_k) + v_{k,j} zk,j=h(yj,xk)+vk ,j

Since the assumed noise vk , j ∼ N ( 0 , Q k , j ) v_{k,j} \sim \mathcal{N}(0, Q_{k,j}) vk,j∼N(0,Qk,j ), then the likelihood of the observation data zj , k z_{j,k} zj,k is
P ( zj , k ∣ xk , yj ) = N ( h ( yj , xk ) , Q k , j ) P(z_{ j,k}|x_k,y_j) = \mathcal{N} (h(y_j, x_k), Q_{k,j}) P(zj,k∣xk,yj)=N(h(yj,xk), Qk,j)

Substituting the above formula into the Gaussian distribution expression and taking the negative logarithm, we get
( , j − h ( xk , yj ) ) TQ k , j − 1 ( zk , j − h ( xk , yj ) ) ) \begin{aligned} (x_k,y_j)^* &= \arg \max \mathcal{ N} (h(y_j, x_k), Q_{k,j}) \\ &= \arg \min \left( (z_{k,j} - h(x_k, y_j))^T Q_{k,j }^{-1} (z_{k,j} - h(x_k, y_j)) \right) \end{aligned} (xk,yj)∗=argmaxN(h(yj,xk),Qk,j)= argmin((zk,j−h(xk,yj))TQk,j−1(zk,j−h(xk,yj)))

The above formula is equivalent to a quadratic form that minimizes the noise term (i.e. error), where Q k , j − 1 Q_{k,j}^{-1} Qk,j−1 is called the information matrix, that is, Gaussian distribution Inverse of the covariance matrix.

Least squares based on observation data zzz and input data uuu

Because the observation zzz and the input uuu are independent, the joint likelihood of zzz and uuu can be factored:
P ( x , y ∣ z , u ) = ∏ k P ( uk ∣ xk − 1 , xk ) ∏ k , j P ( zj , k ∣ xk , yj ) P(x,y|z,u) = \prod_k P(u_k|x_{k-1},x_k) \prod_{k,j} P(z_{j ,k}|x_k,y_j) P(x,y∣z,u)=k∏P(uk∣xk−1,xk)k,j∏P(zj,k∣xk,yj)

Define the error between the input and observed data and the model:
eu , k = xk − f ( xk − 1 , uk ) ez , j , k = zk , j − h ( xk , yj ) \begin{aligned} e_{u ,k} &= x_{k} - f(x_{k-1}, u_k) \\ e_{z,j,k} &= z_{k,j} - h(x_k,y_j) \end{aligned } eu,kez,j,k=xk−f(xk−1,uk)=zk,j−h(xk,yj)

定义
J ( x , y ) = ∑ keu , k TR k − 1 eu , k + ∑ k ∑ jez , k , j TQ k , j − 1 ez , k , j J(x,y) = \sum_k e_{ u,k}^T R_k^{-1}e_{u,k} + \sum_k \sum_j e_{z,k,j}^T Q_{k,j}^{-1}e_{z,k, j} J(x,y)=k∑eu,kTRk−1eu,k+k∑j∑ez,k,jTQk,j−1ez,k,j

则有
( x k , y j ) ∗ = arg ⁡ min ⁡ J ( x , y ) (x_k,y_j)^* = \arg \min J(x,y) (xk,yj)∗=argminJ(x,y)

nonlinear least squares

For the nonlinear least squares problem:
min ⁡ x F ( x ) = 1 2 ∣ ∣ f ( x ) ∣ ∣ 2 2 \min_{x} F(x) = \frac{1}{2} ||f( x)||_2^2 xminF(x)=21∣∣f(x)∣∣22

The specific steps to solve this problem are as follows:

  1. Given some initial value x 0 x_0 x0
  2. For the kkth iteration, find an increment Δ xk \Delta x_k Δxk such that ∣ ∣ F ( xk + Δ xk ) ∣ ∣ 2 2 ||F(x_k +\Delta x_k)||_2^2 ∣∣F( xk+Δxk)∣∣22 reaches the minimum value
  3. If Δ xk \Delta x_k Δxk is small enough, stop
  4. Otherwise, let xk + 1 = xk + Δ xk x_{k +1} =x_k +\Delta x_k xk+1=xk+Δxk, return to step 2

In this way, the least squares problem is transformed into a problem of constantly finding the descending increment Δ xk \Delta x_k Δxk. The specific methods are as follows

First-order and second-order gradient methods

Perform Taylor expansion of the objective function F ( x ) F(x) F(x) near xk x_k xk
F ( xk + Δ xk ) ≈ F ( xk ) + J ( xk ) T Δ xk + 1 2 Δ xk TH ( xk ) xk F(x_k +\Delta x_k) \approx F(x_k) + J(x_k)^T \Delta x_k + \frac{1}{2} \Delta x_k^TH(x_k) x_k F(xk+Δxk )≈F(xk)+J(xk)TΔxk+21ΔxkTH(xk)xk

where J ( x ) J(x) J(x) is the first derivative matrix of F ( x ) F(x) F(x) with respect to xxx, H ( x ) H(x) H(x) is F ( x ) F(x) The second derivative matrix of F(x) with respect to xxx.

  • If Δ xk \Delta x_k Δxk takes the first derivative, then
    Δ xk ∗ = − J ( xk ) \Delta x_k^* = -J(x_k) Δxk∗=−J(xk)

  • 若 Δ x k \Delta x_k Δxk取二阶导数,则
    Δ x k ∗ = arg ⁡ min ⁡ ( F ( x k ) + J ( x k ) T Δ x k + 1 2 Δ x k T H ( x k ) x k ) \Delta x_k^* = \arg \min \left( F(x_k) + J(x_k)^T \Delta x_k + \frac{1}{2} \Delta x_k^T H(x_k) x_k \right) Δxk∗=argmin(F(xk)+J(xk)TΔxk+21ΔxkTH(xk)xk)

    Let the derivative of the above formula for Δ xk \Delta x_k Δxk be equal to 0, then Δ xk ∗ \Delta x_k^* Δxk∗ can be solved by H Δ xk = − JH \Delta x_k = -J HΔxk=−J.

Gauss-Newton method

Perform Taylor expansion of f ( xk ) f(x_k) f(xk) instead of F ( xk ) F(x_k) F(xk) near xk x_k xk f
( xk + Δ xk ) ≈ f ( xk ) + J ( xk ) T Δ xkf(x_k+\Delta x_k) \approx f(x_k) + J(x_k)^T \Delta x_k f(xk+Δxk)≈f(xk)+J(xk)TΔxk


Δ x k ∗ = arg ⁡ min ⁡ Δ x k 1 2 ∣ ∣ f ( x k ) + J ( x k ) T Δ x k ∣ ∣ 2 \Delta x_k^* = \arg \min_{\Delta x_k} \frac{1}{2} ||f(x_k)+J(x_k)^T \Delta x_k||^2 Δxk∗=argΔxkmin21∣∣f(xk)+J(xk)

Guess you like

Origin blog.csdn.net/2301_76379420/article/details/128988060