The optical flow method is a traditional method for target tracking, refer to the video station B to learn and take notes.
1. Motion Field and Optical Flow
Optical flow, as the name implies, is the flow of light. For images, what is the flow of light? Imagine that the camera is moving (the same is true if the target is moving), and the movement of the target is actually the "movement" of "pixels" , so it is called optical flow, as shown in the following figure:
As shown in the figure above, if the camera is moving, the brightness pattern of the tree will change, and this trend of change actually represents the movement of the target.
Question 1: So, what is the relationship between what we think is the change of the so-called pixel brightness pattern and the real movement of the target?
We call the projection (Projection) of the real physical movement of the target on a certain observation as the motion field (Motion Field), for example, the following figure: The target is moving to the upper left, and its projection on the sensor (that is, our observation) also has a Corresponding direction of movement, that is, downward movement.
\space
Question 2: Are motion fields and optical flow equivalent?
The answer is no. Please look at the following situations:
- With sports field without optical flow or with optical flow without sports field
On the left side of the figure below, it is assumed that the sphere is rotating, but the light source is not moving. It is a situation where there is a sports field and no optical flow. On the right side, the sphere is not moving, but the light source is moving.
\space
The example that the optical flow and the sports field are not equal is more direct. For example, the colored lights at the entrance of the barber shop, we know that the real physical movement is horizontal to the right, however, if we focus on the change of pixel brightness, the direction of the optical flow is downward , the two are exactly perpendicular.
\space
2. Optical flow constraint equation
How can we better estimate motion through optical flow ?
First of all, we have to make some assumptions, so that the problem can go on. We consider the following two consecutive frames of pictures:
In this picture, the eagle is the moving target. If we focus on one of the pixels ( x , y ) (x,y)(x,y ) , assuming that the position of the next frame is at( x + δ x , y + δ y ) (x+\delta x, y+\delta y)(x+δ x ,y+δy ) , we can get the speed of the pixel (that is, the brightness motion mode, the theoretical optical flow direction):
( u , v ) = ( δ x / δ t , δ y / δ t ) (u,v)=(\delta x/\delta t, \delta y/\delta t)(u,v)=(δx/δt,δy/δt)
Assumption 1 The brightness of corresponding pixels between two frames remains unchanged
We think that δ t \delta tδ t is small enough so that the pixels we care about will not produce brightness changes between two frames, assuming that the brightness is represented byI ( ⋅ ) I(·)I(⋅)表示, 有:
I ( x , y , t ) = I ( x + δ x , y + δ y , t + δ t ) I(x, y, t)=I(x+\delta x, y+\delta y, t+\delta t) I(x,y,t)=I(x+δ x ,y+y ,t+δt)
We carry out the Taylor expansion of the multivariate function on the right side of the equal sign, and ignore the quadratic terms and higher-order terms, as follows:
I ( x + δ x , y + δ y , t + δ t ) = I ( x , y , t ) + ∂ I ∂ x δ x + ∂ I ∂ y δ y + ∂ I ∂ t δ t I(x+\delta x, y+\delta y, t+\delta t)=I(x, y, t)+ \frac{\partial I}{\partial x}\delta x+ \frac{\partial I}{\partial y}\delta y+\frac{\partial I}{\partial t}\delta tI(x+δ x ,y+y ,t+δt)=I(x,y,t)+∂x∂Iδx _+∂y∂Iy+∂t∂Iδt
联立上两式, 有:
∂ I ∂ x δ x + ∂ I ∂ y δ y + ∂ I ∂ t δ t = 0 \frac{\partial I}{\partial x}\delta x+ \frac{\partial I}{\partial y}\delta y+\frac{\partial I}{\partial t}\delta t=0∂x∂Iδx _+∂y∂Iy+∂t∂Iδt=0
令 δ t → 0 \delta t\rightarrow 0 δt→0 . Divide both sides withδ t \delta tδt , there are:
∂ I ∂ x δ x δ t + ∂ I ∂ y δ y δ t + ∂ I ∂ t = 0 \frac{\partial I}{\partial x} \frac{\delta x}{\delta t}+ \frac{\partial I}{\partial y} \frac{\delta y}{\delta t}+\frac{\partial I}{\partial t}=0 ∂x∂Iδtδ x+∂y∂Iδty+∂t∂I=0
Form δ x δ t , δ y δ t \frac{\delta x}{\delta t}, \frac{\delta y}{\delta t}δtδ x,δtyRespectively, the optical flow true value (velocity) u , vu,vu,v , so the above formula can be written as:
I x u + I y v + I t = 0 I_xu+I_yv+I_t=0 Ixu+Iyv+It=0
The above is the optical flow constraint equation .
I x , I y , I t I_x, I_y, I_tIx,Iy,ItThe method is similar to the way of calculating the gradient in image processing , fixing one variable and calculating the remaining variables to get the difference, such as I x I_xIxCalculate as follows:
\space
Ok, now we know I x , I y , I t I_x, I_y, I_tIx,Iy,It, if discussing u , vu,vu,v plane, formula I xu + I yv + I t = 0 I_xu+I_yv+I_t=0Ixu+Iyv+It=0
specifies a straight line on the plane. Butu , vu,vu,v is a variable, we cannot get the exact value.
Specifically, as shown in the figure above, assume that the true value of the optical flow is u = ( u , v ) \boldsymbol u=(u,v)u=(u,v ) , we decompose it into vertical straight line direction and parallel straight line direction
u = un + up \boldsymbol{u}=\boldsymbol{u_n}+\boldsymbol{u_p}u=un+up.
According to the knowledge of geometry, it is quickly obtained:
u n = ∣ I t ∣ I x 2 + I y 2 ( I x , I y ) \boldsymbol{u_n}=\frac{|I_t|}{\sqrt{I_x^2+I_y^2}}(I_x, I_y) un=Ix2+Iy2∣It∣(Ix,Iy)
However, up \boldsymbol{u_p}upWe don't know, so more constraints are needed.
3. Lucas Kanade Algorithm
Continuing from the above, we need more constraints. The assumptions made by the Lucas Kanade algorithm are:
Assumption 2 assumes that for each pixel, the motion field is the same as the optical flow and the pixels in its neighborhood
Suppose a certain neighborhood of a pixel is WWW , the size isn × nn \times nn×n rectangle, then forWWEach pixel in W ( k , l ) ∈ W (k,l)\in W(k,l)∈W ,光流都相同, 因此有:
I x ( k , l ) u + I y ( k , l ) v + I t ( k , l ) = 0 , ( k , l ) ∈ W I_x(k,l )u+I_y(k,l)v+I_t(k,l)=0, (k,l)\in WIx(k,l ) and+Iy(k,l)v+It(k,l)=0,(k,l)∈W
above isn 2 n^2n2 equations, we write in matrix form:
[ I x ( 1 , 1 ) I y ( 1 , 1 ) . . . . . . I x ( n , n ) I y ( n , n ) ] [ uv ] = [ − I t ( 1 , 1 ) . . . − I t ( n , n ) ] \begin{bmatrix} I_x(1,1) & I_y(1,1) \\ ... & ... \\ I_x(n, n) & I_y(n, n) \end{bmatrix} \begin{bmatrix} u \\ v \\ \end{bmatrix}= \begin{bmatrix} -I_t(1,1) \\ . .. \\ -I_t(n, n) \end{bmatrix}⎣
⎡Ix(1,1)...Ix(n,n)Iy(1,1)...Iy(n,n)⎦
⎤[uv]=⎣
⎡−It(1,1)...−It(n,n)⎦
⎤
记为:
A u = B , A ∈ R n 2 × 2 , u = [ u , v ] T , B ∈ R n 2 × 1 \boldsymbol{Au=B}, A \in \mathbb{R}^{n^2\times 2}, u=[u, v]^T, B\in \mathbb{R}^{n^2\times 1} Au=B,A∈Rn2×2,u=[u,v]T,B∈Rn2×1
n 2 n^2 nThe number 2 is a bit large, for the convenience of calculation, we write the equation as:
( A T A ) u = A T B \boldsymbol{(A^TA)u=A^TB} (AT A)and=ATB
其中 A T A ∈ R 2 × 2 \boldsymbol{A^TA}\in \mathbb{R}^{2\times 2} AT A∈R2 × 2 . StationATA \boldsymbol{A^TA}AT Areversible, yes:
u = ( ATA ) − 1 ATB \boldsymbol{u=(A^TA)^{-1}A^TB}u=(AT A)−1ATB
It can be seen that the optical flow estimate is essentially n 2 n^2n2 constraint equationsA u = B Au=BAu=The least squares solution of B , so the optical flow is solved. However, the solution method above the LK algorithm also has its limitations, andmust require:
- A T A A^TA ATA reversible.
- A T A A^TAAT Amust be well-conditioned. The so-called well-conditioned refers to its two eigenvaluesλ 1 , λ 2 \lambda_1, \lambda_2l1,l2Satisfied:
neither of them is too close to 0, and
λ 1 ≥ λ 2 \lambda_1 \ge \lambda_2l1≥l2And
not λ 1 > > λ 2 \lambda_1 >> \lambda_2l1>>l2.
The meaning of weil-conditioned is that if the eigenvalues are too small, the uu obtained by solvingu is also small, the result is unreliable.
If the two eigenvalues differ greatly, the specific direction of motion cannot be guaranteed.
4. Coarse to fine optical flow estimation (optical flow pyramid, Coarse to fine estimation)
The LK algorithm has a priori assumption, namely δ t \delta tδ t is small enough, that is, the assumption of small motion. But if the range of motion is large, how to do it?
We can reduce the resolution of the picture, for example, by pooling, so that the original large motion will become small motion. As shown in the figure below.
After converting into small motion, we can use the previous algorithm to solve the optical flow.
Assume that we have four resolutions, from low to high are 1, 2, 3, 4. At 1 resolution, we use the estimation of small motion to calculate the optical flow, assuming the result is ( u , v ) 0 (u , v)^0(u,v)0.
In this way, we use the optical flow calculated on the 1 resolution on the 2 resolution to generate a new image (similar to frame interpolation). This image will be closer to the motion of the original resolution, but not accurate. This step is called " Wrap".
Therefore, similarly, we recalculate the optical flow for the 2-resolution after frame interpolation, and obtain the optical flow change Δ ( u , v ) 1 \Delta (u,v)^1 from frame interpolation to the final imageD ( u ,v)1. So at this point we can get a more accurate optical flow estimate( u , v ) 1 = ( u , v ) 0 + Δ ( u , v ) 1 (u,v)^1=(u,v)^0 +\Delta (u,v)^1(u,v)1=(u,v)0+D ( u ,v)1. Thisstep is called "Plus".
Then apply this optical flow interpolation frame to 3 resolutions, and continue like this. As shown in the figure:
Therefore, we estimate the large motion of fine resolution step by step from the small motion of coarse resolution, in short, it is an iterative process of Warp+Plus.
5. Scene reconstruction under camera motion
The premise of discussing this issue is that the camera is assumed to be an Orthographic Camera, that is, the size of the object in the final rendered picture remains the same regardless of whether the object is far or near from the camera. In other words, the difference in the scene imaged by the camera is much greater than the depth of field, so the size of the object is no longer important. At this time, we hope to restore the 3D information of the key points in the 2D image when the camera is moving. As shown in the figure below.
5.0 Problem Modeling
In the world coordinate (in the real three-dimensional space), suppose the ttthThe coordinates of the t -frame camera areC f ∈ R 3 C_f\in\mathbb{R}^3Cf∈R3 , the coordinates of a key point in the scene areP f ∈ R 3 P_f\in\mathbb{R}^3Pf∈R3 , but the camera obtains a 2D image, assuming that the coordinates on the imaging plane are[ u , v ] T ∈ R 2 [u, v]^T\in\mathbb{R}^2[u,v]T∈R2 , as shown in the figure below (where O is the origin of the world coordinate,i , j ∈ R 3 i,j\in\mathbb{R}^3i,j∈R3 is the three-dimensional vector of the unit coordinate system of 2D imaging (the vector in the world coordinate), which represents the orientation (position) of the camera):
According to the geometric relationship, there are:
u = i T xcv = j T xcxc = P − C u = i^Tx_c\\ v=j^Tx_c \\ x_c = P - Cu=iTxcv=jTxcxc=P−C
sou
= i T ( P − C ) , v = j T ( P − C ) u = i^T(PC),~~v=j^T(PC)u=iT(P−C), v=jT(P−C)
u , v u,v u,We know v , but the orientation of the camera ( i , ji,ji,j ) and real coordinatesP , CP,CP,We don't know C. So how to solvePPWhat about P ?
Make an assumption here, assuming that the origin of the world coordinate is at the center of all key points in the imaging:
∑ p P f , p = 0 , ∀ f \sum_pP_{f,p}=0,~~\forall fp∑Pf,p=0, ∀f
where ppp means the pth point.
We consider the center coordinates in the 2D picture under this definition:
u ‾ f = 1 ∣ P ∣ ∑ p ∈ P uf , p = 1 ∣ P ∣ ∑ p ∈ P if T ( P p − C f ) = − 1 ∣ P ∣ ∑ p ∈ P if TC f = − if TC f \overline{u}_f=\frac{1}{|P|}\sum_{p\in P}u_{f, p}\\ =\ frac{1}{|P|}\sum_{p\in P}i_f^T(P_p - C_f)\\ =-\frac{1}{|P|}\sum_{p\in P}i_f^T C_f=-i_f^TC_fuf=∣P∣1p∈P∑uf,p=∣P∣1p∈P∑ifT(Pp−Cf)=−∣P∣1p∈P∑ifTCf=−ifTCf
We move the origin of the 2D coordinate system to the above center coordinates, then the new coordinates are
u ^ f , p = uf , p − u ‾ f = if T ( P p − C f ) − ( − if TC f ) = if TP p \hat{u}_{f,p}=u_{f,p}-\overline{u}_f\\ = i_f^T(P_p-C_f)-(-i_f^TC_f)=i_f^TP_pu^f,p=uf,p−uf=ifT(Pp−Cf)−(−ifTCf)=ifTPp
Therefore, under the above assumptions, the new coordinates have nothing to do with the real position of the camera.
for vvv coordinates, similarly there isv ^ f , p = jf TP p \hat{v}_{f,p}=j_f^TP_pv^f,p=jfTPp.
Now we have FFF frames, assuming each frame maintainsNNN key points, each point has the following relationship:
[ u ^ f , pv ^ f , p ] = [ if T jf T ] P p \begin{bmatrix} \hat{u}_{f,p}\\ \hat{v}_{f,p} \end{bmatrix} = \begin{bmatrix} i_f^T\\ j_f^T \end{bmatrix} P_p[u^f,pv^f,p]=[ifTjfT]Pp
So we can list the following equations:
concisely written as:
W 2 F × N = M 2 F × 3 S 3 × N W_{2F\times N}=M_{2F\times 3}S_{3\times N}W2F×N=M2F×3S3×N
The two matrices on the right side of the equation are unknown. The Tomasi-Kanade method is introduced below to solve
5.1 Problem Solving
We consider WWPerform singular value decomposition of W to see if it can be decomposed into the form of multiplication of two matrices and then determineM and SM and SM and S . LetWWFor WSVD: W
= U Σ VTW=U\Sigma V^TW=UΣVT
is obviouslyrank ( W ) ≤ 3 \text{rank}(W)\le 3rank(W)≤3 , soWWW also has only three non-zero singular values, taking its truncated SVD:
W = U 1 Σ ′ V 1 T W=U_1\Sigma' V_1^T W=U1S′V1T
其中 U 1 ∈ R 2 F × 3 , Σ ′ ∈ R 3 × 3 , V 1 T ∈ R 3 × N U_1\in \mathbb{R}^{2F\times 3}, \Sigma' \in \mathbb{R}^{3\times 3}, V_1^T\in \mathbb{R}^{3\times N} U1∈R2F×3,S′∈R3×3,V1T∈R3×N.
We will WWW is rewritten as
W = U 1 Σ ′ 1 / 2 Q Q − 1 Σ ′ 1 / 2 V 1 T W=U_1\Sigma'^{1/2}QQ^{-1} \Sigma'^{1/2}V_1^T W=U1S′1/2QQ− 1 S′ 1/2 V1T
where Q ∈ R 3 × 3 Q \in \mathbb{R}^{3\times 3}Q∈R3 × 3 is a non-singular matrix, let
M = U 1 Σ ′ 1 / 2 QS = Q − 1 Σ ′ 1 / 2 V 1 TM = U_1\Sigma'^{1/2}Q\\ S = Q^{ -1} \Sigma'^{1/2}V_1^TM=U1S′1/2QS=Q− 1 S′ 1/2 V1T
Now the problem is to solve the QQQ , can be calculated. We note thatwe have not used the orthogonality of 2D coordinates.
M = U 1 Σ ′ 1 / 2 Q = [ i ^ 1 T . . . i ^ F T ] Q = [ i ^ 1 T Q . . . i ^ F T Q ] M = U_1\Sigma'^{1/2}Q= \begin{bmatrix} \hat{i}_{1}^T\\ ...\\ \hat{i}_{F}^T \end{bmatrix}Q= \begin{bmatrix} \hat{i}_{1}^TQ\\ ...\\ \hat{i}_{F}^TQ \end{bmatrix} M=U1S′1/2Q=⎣
⎡i^1T...i^FT⎦
⎤Q=⎣
⎡i^1TQ...i^FTQ⎦
⎤
therefore:
∀ f , i ^ f T Q Q T i ^ f = 1 , ∀ f 1 ≠ f 2 , i ^ f 1 T Q Q T i ^ f 2 = 0 \forall f, ~~\hat{i}_{f}^TQQ^T\hat{i}_{f}=1,\\ \forall f_1\ne f_2, ~~\hat{i}_{f_1}^TQQ^T\hat{i}_{f_2}=0 ∀f, i^fTQQTi^f=1,∀f1=f2, i^f1TQQTi^f2=0
The above is 3 F 3F3 F equations, andQQQ has only nine elements, so this is an overdetermined equation, andQQQ . Solve forQQAfter Q , bring it into the result formula of SVD, you can calculateMMM andSSS.