[Multiple target tracking study notes] Optical flow method, motion reconstruction and target tracking


The optical flow method is a traditional method for target tracking, refer to the video station B to learn and take notes.


1. Motion Field and Optical Flow

Optical flow, as the name implies, is the flow of light. For images, what is the flow of light? Imagine that the camera is moving (the same is true if the target is moving), and the movement of the target is actually the "movement" of "pixels" , so it is called optical flow, as shown in the following figure:

insert image description here
As shown in the figure above, if the camera is moving, the brightness pattern of the tree will change, and this trend of change actually represents the movement of the target.

Question 1: So, what is the relationship between what we think is the change of the so-called pixel brightness pattern and the real movement of the target?

We call the projection (Projection) of the real physical movement of the target on a certain observation as the motion field (Motion Field), for example, the following figure: The target is moving to the upper left, and its projection on the sensor (that is, our observation) also has a Corresponding direction of movement, that is, downward movement.

insert image description here
  \space  
Question 2: Are motion fields and optical flow equivalent?

The answer is no. Please look at the following situations:

  1. With sports field without optical flow or with optical flow without sports field

On the left side of the figure below, it is assumed that the sphere is rotating, but the light source is not moving. It is a situation where there is a sports field and no optical flow. On the right side, the sphere is not moving, but the light source is moving.

insert image description here
  \space 


  1. The example that the optical flow and the sports field are not equal is more direct. For example, the colored lights at the entrance of the barber shop, we know that the real physical movement is horizontal to the right, however, if we focus on the change of pixel brightness, the direction of the optical flow is downward , the two are exactly perpendicular.

insert image description here
  \space 

2. Optical flow constraint equation

How can we better estimate motion through optical flow ?

First of all, we have to make some assumptions, so that the problem can go on. We consider the following two consecutive frames of pictures:

insert image description here
In this picture, the eagle is the moving target. If we focus on one of the pixels ( x , y ) (x,y)(x,y ) , assuming that the position of the next frame is at( x + δ x , y + δ y ) (x+\delta x, y+\delta y)(x+δ x ,y+δy ) , we can get the speed of the pixel (that is, the brightness motion mode, the theoretical optical flow direction):

( u , v ) = ( δ x / δ t , δ y / δ t ) (u,v)=(\delta x/\delta t, \delta y/\delta t)(u,v)=(δx/δt,δy/δt)

Assumption 1 The brightness of corresponding pixels between two frames remains unchanged

We think that δ t \delta tδ t is small enough so that the pixels we care about will not produce brightness changes between two frames, assuming that the brightness is represented byI ( ⋅ ) I(·)I()表示, 有:
I ( x , y , t ) = I ( x + δ x , y + δ y , t + δ t ) I(x, y, t)=I(x+\delta x, y+\delta y, t+\delta t) I(x,y,t)=I(x+δ x ,y+y ,t+δt)

We carry out the Taylor expansion of the multivariate function on the right side of the equal sign, and ignore the quadratic terms and higher-order terms, as follows:
I ( x + δ x , y + δ y , t + δ t ) = I ( x , y , t ) + ∂ I ∂ x δ x + ∂ I ∂ y δ y + ∂ I ∂ t δ t I(x+\delta x, y+\delta y, t+\delta t)=I(x, y, t)+ \frac{\partial I}{\partial x}\delta x+ \frac{\partial I}{\partial y}\delta y+\frac{\partial I}{\partial t}\delta tI(x+δ x ,y+y ,t+δt)=I(x,y,t)+xIδx _+yIy+tIδt

联立上两式, 有:
∂ I ∂ x δ x + ∂ I ∂ y δ y + ∂ I ∂ t δ t = 0 \frac{\partial I}{\partial x}\delta x+ \frac{\partial I}{\partial y}\delta y+\frac{\partial I}{\partial t}\delta t=0xIδx _+yIy+tIδt=0

δ t → 0 \delta t\rightarrow 0 δt0 . Divide both sides withδ t \delta tδt , there are:

∂ I ∂ x δ x δ t + ∂ I ∂ y δ y δ t + ∂ I ∂ t = 0 \frac{\partial I}{\partial x} \frac{\delta x}{\delta t}+ \frac{\partial I}{\partial y} \frac{\delta y}{\delta t}+\frac{\partial I}{\partial t}=0 xIδtδ x+yIδty+tI=0

Form δ x δ t , δ y δ t \frac{\delta x}{\delta t}, \frac{\delta y}{\delta t}δtδ x,δtyRespectively, the optical flow true value (velocity) u , vu,vu,v , so the above formula can be written as:

I x u + I y v + I t = 0 I_xu+I_yv+I_t=0 Ixu+Iyv+It=0

The above is the optical flow constraint equation .

I x , I y , I t I_x, I_y, I_tIx,Iy,ItThe method is similar to the way of calculating the gradient in image processing , fixing one variable and calculating the remaining variables to get the difference, such as I x I_xIxCalculate as follows:

insert image description here   \space  
Ok, now we know I x , I y , I t I_x, I_y, I_tIx,Iy,It, if discussing u , vu,vu,v plane, formula I xu + I yv + I t = 0 I_xu+I_yv+I_t=0Ixu+Iyv+It=0
specifies a straight line on the plane. Butu , vu,vu,v is a variable, we cannot get the exact value.
insert image description here

Specifically, as shown in the figure above, assume that the true value of the optical flow is u = ( u , v ) \boldsymbol u=(u,v)u=(u,v ) , we decompose it into vertical straight line direction and parallel straight line direction
u = un + up \boldsymbol{u}=\boldsymbol{u_n}+\boldsymbol{u_p}u=un+up.

According to the knowledge of geometry, it is quickly obtained:

u n = ∣ I t ∣ I x 2 + I y 2 ( I x , I y ) \boldsymbol{u_n}=\frac{|I_t|}{\sqrt{I_x^2+I_y^2}}(I_x, I_y) un=Ix2+Iy2 It(Ix,Iy)

However, up \boldsymbol{u_p}upWe don't know, so more constraints are needed.

3. Lucas Kanade Algorithm

Continuing from the above, we need more constraints. The assumptions made by the Lucas Kanade algorithm are:

Assumption 2 assumes that for each pixel, the motion field is the same as the optical flow and the pixels in its neighborhood

Suppose a certain neighborhood of a pixel is WWW , the size isn × nn \times nn×n rectangle, then forWWEach pixel in W ( k , l ) ∈ W (k,l)\in W(k,l)W ,光流都相同, 因此有:
I x ( k , l ) u + I y ( k , l ) v + I t ( k , l ) = 0 , ( k , l ) ∈ W I_x(k,l )u+I_y(k,l)v+I_t(k,l)=0, (k,l)\in WIx(k,l ) and+Iy(k,l)v+It(k,l)=0,(k,l)W
above isn 2 n^2n2 equations, we write in matrix form:
[ I x ( 1 , 1 ) I y ( 1 , 1 ) . . . . . . I x ( n , n ) I y ( n , n ) ] [ uv ] = [ − I t ( 1 , 1 ) . . . − I t ( n , n ) ] \begin{bmatrix} I_x(1,1) & I_y(1,1) \\ ... & ... \\ I_x(n, n) & I_y(n, n) \end{bmatrix} \begin{bmatrix} u \\ v \\ \end{bmatrix}= \begin{bmatrix} -I_t(1,1) \\ . .. \\ -I_t(n, n) \end{bmatrix} Ix(1,1)...Ix(n,n)Iy(1,1)...Iy(n,n) [uv]= It(1,1)...It(n,n)

记为:
A u = B , A ∈ R n 2 × 2 , u = [ u , v ] T , B ∈ R n 2 × 1 \boldsymbol{Au=B}, A \in \mathbb{R}^{n^2\times 2}, u=[u, v]^T, B\in \mathbb{R}^{n^2\times 1} Au=B,ARn2×2,u=[u,v]T,BRn2×1

n 2 n^2 nThe number 2 is a bit large, for the convenience of calculation, we write the equation as:

( A T A ) u = A T B \boldsymbol{(A^TA)u=A^TB} (AT A)and=ATB
其中 A T A ∈ R 2 × 2 \boldsymbol{A^TA}\in \mathbb{R}^{2\times 2} AT AR2 × 2 . StationATA \boldsymbol{A^TA}AT Areversible, yes:
u = ( ATA ) − 1 ATB \boldsymbol{u=(A^TA)^{-1}A^TB}u=(AT A)1ATB

It can be seen that the optical flow estimate is essentially n 2 n^2n2 constraint equationsA u = B Au=BAu=The least squares solution of B , so the optical flow is solved. However, the solution method above the LK algorithm also has its limitations, andmust require:

  1. A T A A^TA ATA reversible.
  2. A T A A^TAAT Amust be well-conditioned. The so-called well-conditioned refers to its two eigenvalues​​λ 1 , λ 2 \lambda_1, \lambda_2l1,l2Satisfied:
    neither of them is too close to 0, and
    λ 1 ≥ λ 2 \lambda_1 \ge \lambda_2l1l2And
    not λ 1 > > λ 2 \lambda_1 >> \lambda_2l1>>l2.

The meaning of weil-conditioned is that if the eigenvalues ​​are too small, the uu obtained by solvingu is also small, the result is unreliable.
If the two eigenvalues ​​differ greatly, the specific direction of motion cannot be guaranteed.

4. Coarse to fine optical flow estimation (optical flow pyramid, Coarse to fine estimation)

The LK algorithm has a priori assumption, namely δ t \delta tδ t is small enough, that is, the assumption of small motion. But if the range of motion is large, how to do it?

We can reduce the resolution of the picture, for example, by pooling, so that the original large motion will become small motion. As shown in the figure below.

insert image description here
After converting into small motion, we can use the previous algorithm to solve the optical flow.

Assume that we have four resolutions, from low to high are 1, 2, 3, 4. At 1 resolution, we use the estimation of small motion to calculate the optical flow, assuming the result is ( u , v ) 0 (u , v)^0(u,v)0.

In this way, we use the optical flow calculated on the 1 resolution on the 2 resolution to generate a new image (similar to frame interpolation). This image will be closer to the motion of the original resolution, but not accurate. This step is called " Wrap".

Therefore, similarly, we recalculate the optical flow for the 2-resolution after frame interpolation, and obtain the optical flow change Δ ( u , v ) 1 \Delta (u,v)^1 from frame interpolation to the final imageD ( u ,v)1. So at this point we can get a more accurate optical flow estimate( u , v ) 1 = ( u , v ) 0 + Δ ( u , v ) 1 (u,v)^1=(u,v)^0 +\Delta (u,v)^1(u,v)1=(u,v)0+D ( u ,v)1. Thisstep is called "Plus".

Then apply this optical flow interpolation frame to 3 resolutions, and continue like this. As shown in the figure:

insert image description here
Therefore, we estimate the large motion of fine resolution step by step from the small motion of coarse resolution, in short, it is an iterative process of Warp+Plus.

5. Scene reconstruction under camera motion

The premise of discussing this issue is that the camera is assumed to be an Orthographic Camera, that is, the size of the object in the final rendered picture remains the same regardless of whether the object is far or near from the camera. In other words, the difference in the scene imaged by the camera is much greater than the depth of field, so the size of the object is no longer important. At this time, we hope to restore the 3D information of the key points in the 2D image when the camera is moving. As shown in the figure below.

insert image description here

5.0 Problem Modeling

In the world coordinate (in the real three-dimensional space), suppose the ttthThe coordinates of the t -frame camera areC f ∈ R 3 C_f\in\mathbb{R}^3CfR3 , the coordinates of a key point in the scene areP f ∈ R 3 P_f\in\mathbb{R}^3PfR3 , but the camera obtains a 2D image, assuming that the coordinates on the imaging plane are[ u , v ] T ∈ R 2 [u, v]^T\in\mathbb{R}^2[u,v]TR2 , as shown in the figure below (where O is the origin of the world coordinate,i , j ∈ R 3 i,j\in\mathbb{R}^3i,jR3 is the three-dimensional vector of the unit coordinate system of 2D imaging (the vector in the world coordinate), which represents the orientation (position) of the camera):

insert image description here
According to the geometric relationship, there are:
u = i T xcv = j T xcxc = P − C u = i^Tx_c\\ v=j^Tx_c \\ x_c = P - Cu=iTxcv=jTxcxc=PC
sou
= i T ( P − C ) , v = j T ( P − C ) u = i^T(PC),~~v=j^T(PC)u=iT(PC),  v=jT(PC)
u , v u,v u,We know v , but the orientation of the camera ( i , ji,ji,j ) and real coordinatesP , CP,CP,We don't know C. So how to solvePPWhat about P ?

Make an assumption here, assuming that the origin of the world coordinate is at the center of all key points in the imaging:
∑ p P f , p = 0 , ∀ f \sum_pP_{f,p}=0,~~\forall fpPf,p=0,  f

where ppp means the pth point.

We consider the center coordinates in the 2D picture under this definition:
u ‾ f = 1 ∣ P ∣ ∑ p ∈ P uf , p = 1 ∣ P ∣ ∑ p ∈ P if T ( P p − C f ) = − 1 ∣ P ∣ ∑ p ∈ P if TC f = − if TC f \overline{u}_f=\frac{1}{|P|}\sum_{p\in P}u_{f, p}\\ =\ frac{1}{|P|}\sum_{p\in P}i_f^T(P_p - C_f)\\ =-\frac{1}{|P|}\sum_{p\in P}i_f^T C_f=-i_f^TC_fuf=P1pPuf,p=P1pPifT(PpCf)=P1pPifTCf=ifTCf

We move the origin of the 2D coordinate system to the above center coordinates, then the new coordinates are
u ^ f , p = uf , p − u ‾ f = if T ( P p − C f ) − ( − if TC f ) = if TP p \hat{u}_{f,p}=u_{f,p}-\overline{u}_f\\ = i_f^T(P_p-C_f)-(-i_f^TC_f)=i_f^TP_pu^f,p=uf,puf=ifT(PpCf)(ifTCf)=ifTPp

Therefore, under the above assumptions, the new coordinates have nothing to do with the real position of the camera.

for vvv coordinates, similarly there isv ^ f , p = jf TP p \hat{v}_{f,p}=j_f^TP_pv^f,p=jfTPp.

Now we have FFF frames, assuming each frame maintainsNNN key points, each point has the following relationship:
[ u ^ f , pv ^ f , p ] = [ if T jf T ] P p \begin{bmatrix} \hat{u}_{f,p}\\ \hat{v}_{f,p} \end{bmatrix} = \begin{bmatrix} i_f^T\\ j_f^T \end{bmatrix} P_p[u^f,pv^f,p]=[ifTjfT]Pp
So we can list the following equations:
insert image description here
concisely written as:
W 2 F × N = M 2 F × 3 S 3 × N W_{2F\times N}=M_{2F\times 3}S_{3\times N}W2F×N=M2F×3S3×N

The two matrices on the right side of the equation are unknown. The Tomasi-Kanade method is introduced below to solve

5.1 Problem Solving

We consider WWPerform singular value decomposition of W to see if it can be decomposed into the form of multiplication of two matrices and then determineM and SM and SM and S . LetWWFor WSVD: W
= U Σ VTW=U\Sigma V^TW=UΣVT
is obviouslyrank ( W ) ≤ 3 \text{rank}(W)\le 3rank(W)3 , soWWW also has only three non-zero singular values, taking its truncated SVD:

W = U 1 Σ ′ V 1 T W=U_1\Sigma' V_1^T W=U1SV1T
其中 U 1 ∈ R 2 F × 3 , Σ ′ ∈ R 3 × 3 , V 1 T ∈ R 3 × N U_1\in \mathbb{R}^{2F\times 3}, \Sigma' \in \mathbb{R}^{3\times 3}, V_1^T\in \mathbb{R}^{3\times N} U1R2F×3,SR3×3,V1TR3×N.

We will WWW is rewritten as

W = U 1 Σ ′ 1 / 2 Q Q − 1 Σ ′ 1 / 2 V 1 T W=U_1\Sigma'^{1/2}QQ^{-1} \Sigma'^{1/2}V_1^T W=U1S1/2QQ1 S1/2 V1T
where Q ∈ R 3 × 3 Q \in \mathbb{R}^{3\times 3}QR3 × 3 is a non-singular matrix, let
M = U 1 Σ ′ 1 / 2 QS = Q − 1 Σ ′ 1 / 2 V 1 TM = U_1\Sigma'^{1/2}Q\\ S = Q^{ -1} \Sigma'^{1/2}V_1^TM=U1S1/2QS=Q1 S1/2 V1T
Now the problem is to solve the QQQ , can be calculated. We note thatwe have not used the orthogonality of 2D coordinates.

M = U 1 Σ ′ 1 / 2 Q = [ i ^ 1 T . . . i ^ F T ] Q = [ i ^ 1 T Q . . . i ^ F T Q ] M = U_1\Sigma'^{1/2}Q= \begin{bmatrix} \hat{i}_{1}^T\\ ...\\ \hat{i}_{F}^T \end{bmatrix}Q= \begin{bmatrix} \hat{i}_{1}^TQ\\ ...\\ \hat{i}_{F}^TQ \end{bmatrix} M=U1S1/2Q= i^1T...i^FT Q= i^1TQ...i^FTQ
therefore:

∀ f ,    i ^ f T Q Q T i ^ f = 1 , ∀ f 1 ≠ f 2 ,    i ^ f 1 T Q Q T i ^ f 2 = 0 \forall f, ~~\hat{i}_{f}^TQQ^T\hat{i}_{f}=1,\\ \forall f_1\ne f_2, ~~\hat{i}_{f_1}^TQQ^T\hat{i}_{f_2}=0 f,  i^fTQQTi^f=1,f1=f2,  i^f1TQQTi^f2=0

The above is 3 F 3F3 F equations, andQQQ has only nine elements, so this is an overdetermined equation, andQQQ . Solve forQQAfter Q , bring it into the result formula of SVD, you can calculateMMM andSSS.

Guess you like

Origin blog.csdn.net/wjpwjpwjp0831/article/details/125023214