Basic concepts and typical algorithms of optical flow
Article directory
What is optical flow?
- Optical flow is the instantaneous speed of the pixel movement of the space moving object on the observation imaging plane. It uses the change of pixels in the image sequence in the time domain and the correlation between adjacent frames to find the existence of the previous frame and the current frame. A method to calculate the motion information of objects between adjacent frames. Generally speaking, optical flow is generated due to the movement of the foreground object itself in the scene, the movement of the camera, or the joint movement of both. reference blog
- Optical flow expresses the change of the image, and because it contains the information of the target's motion, it can be used by the observer to determine the motion of the target. The optical flow field can be derived from the definition of optical flow, which refers to a two-dimensional (2D) instantaneous velocity field composed of all pixels in the image, where the two-dimensional velocity vector is the three-dimensional velocity vector of the visible points in the scene on the imaging surface projection. Therefore, optical flow not only contains the motion information of the observed object, but also contains rich information about the three-dimensional structure of the scene. Baidu Encyclopedia
- Optical flow is currently used in many fields, including: motion-based 3D reconstruction, video compression, object tracking, and behavior recognition.
1. Traditional classic optical flow algorithm: Lucas-Kanade
Basic assumptions:
(1) Assumption of constant brightness: that is, the optical flow to be estimated has the same brightness of the same object in two frames of images (
2) Similar assumption of neighborhood optical flow: centered on the pixel point (x, y), It is set that the optical flow values of all pixels in the neighborhood of n*n are consistent. (n will not be very large)
- For assumption (1), we can get:
I ( x , y , t ) = I ( x + u , y + v , t + Δ t ) I(x,y,t)=I(x+u,y +v,t+ \Delta _t)I(x,y,t)=I(x+u,y+v,t+Dt)
to Taylor expansion on the right side of the equation, that is:
I ( x + u , y + v , t + Δ t ) = I ( x , y , t ) + I x ′ u + I y ′ v + I t ′ Δt I(x+u,y+v,t+\Delta_t)=I(x,y,t)+I'_xu+I'_yv+I'_t\Delta_tI(x+u,y+v,t+Dt)=I(x,y,t)+Ix′u+Iy′v+It′Dt
也就是说I ( x , y , t ) + I x ′ u + I y ′ v + I t ′ Δ t = I ( x , y , t ) I(x,y,t)+I'_xu+I '_yv+I'_t\Delta_t=I(x,y,t)I(x,y,t)+Ix′u+Iy′v+It′Dt=I(x,y,t )
所以:I x ′ u + I y ′ v + I t ′ Δ t = 0 I'_xu+I'_yv+I'_t\Delta_t=0Ix′u+Iy′v+It′Dt=0,即 [ I x ′ , I y ′ ] [ u , v ] T = − Δ I t [I'_x,I'_y][u,v]^T=-\Delta I_t [Ix′,Iy′][u,v]T=- I _t
where I x ′ , I y ′ I'_x,I'_yIx′,Iy′Respectively represent at the pixel point ( x , y ) (x,y)(x,y ) , the brightness of the image is atx , yx,yx,The partial derivative in the y direction, that is, the gradient of the image brightness. I t ′ I'_tIt′is the partial derivative of image brightness with respect to time, and I t ′ Δ t I'_t \Delta_tIt′DtIndicates ( x , y ) (x,y) between two pictures (usually adjacent frames)(x,y ) The amount of change in the brightness of the pixel at the coordinate position can be expressed asΔ I t = I t ′ Δ t \Delta I_t = I'_t\Delta _tI _t=It′Dt。 u , v u,v u,v is the optical flow value to be estimated.
I x ′ , I y ′ , Δ I t I'_x,I'_y,\Delta I_tIx′,Iy′,I _tBoth can be obtained by direct calculation on the image. while u , vu,vu,There is only one equation to obtain v , and the solution cannot be completed.
- For assumption (2), all pixel values in the given neighborhood satisfy the equation I x ′ u + I y ′ v + I t ′ Δ t = 0 I'_xu+I'_yv+I'_t\Delta_t= 0Ix′u+Iy′v+It′Dt=0,由此我们可以将矩阵进行扩充得到:
[ I x ′ ( 1 ) I y ′ ( 1 ) I x ′ ( 2 ) I y ′ ( 2 ) . . . . . . I x ′ ( n ) I y ′ ( n ) ] [ u , v ] T = [ − Δ I t ( 1 ) − Δ I t ( 2 ) . . . − Δ I t ( n ) ] \left [ \begin{array}{l} I^{'(1)}_x & I^{'(1)}_y \\ I^{'(2)}_x & I^{'(2)}_y \\ ...&...\\ I^{'(n)}_x & I^{'(n)}_y \\ \end{array} \right] [u,v]^T= \left[ \begin{array}{l} -\Delta I^{(1)}_t \\ -\Delta I^{(2)}_t \\ ...\\ -\Delta I^{(n)}_t \\ \end{array} \right] ⎣ ⎡Ix′(1)Ix′(2)...Ix′(n)Iy′(1)Iy′(2)...Iy′(n)⎦ ⎤[u,v]T=⎣ ⎡- I _t(1)- I _t(2)...- I _t(n)⎦ ⎤
Immediately: A x = b Ax=bAx=b . For this set of equations, the method of least squares can be used( x = ( ATA ) − 1 AT b ) (x=(A^TA)^{-1}A^Tb)(x=(AT A)−1AT b)Approximate estimate[ u , v ] [u, v][u,v ] solution. But note that( ATA ) (A^TA)(AT A)is reversible if desired.
- The Lucas-Kanade algorithm also uses the method of image pyramid (Pyramid). On high-level low-resolution images, large offsets will become small offsets. Ultimately, the Lucas-Kanade method gives a way to solve sparse (corner points of distinct features) optical flow. Image pyramid is an important way of multi-scale image adjustment and expression. The principle of image pyramid method is: decompose each image participating in the fusion into a multi-scale pyramid image sequence, put the low-resolution image on the upper layer, and the high-resolution image The high-rate image is in the lower layer, and the size of the upper layer image is 1/4 of the size of the previous layer image. The number of layers is 0,1,2...N. A composite pyramid can be obtained by fusing the pyramids of all images on the corresponding layers according to certain rules, and then reconstructing the composite pyramid according to the inverse process of pyramid generation to obtain a fusion pyramid.
- Of course, the Horn-Schunck optical flow method is also a good algorithm. The algorithm is mainly based on the assumption of global smoothness (that is, assuming that the calculated optical flow field is continuously changing), and solves the optical flow by optimizing the energy equation. This algorithm can obtain the dense optical flow information of the image, but it has a large amount of calculation and weak anti-interference performance, so it is suitable for application scenarios with less interference and low real-time performance.
2. Optical flow algorithm based on neural network: FlowNet&FlowNet2.0
1.FlowNet
Paper address: https://arxiv.org/abs/1504.06852v2
-
Basic idea: use neural network E nd − to − E nd End-to-EndEnd−to−The advantages of E n d , input two pictures of adjacent frames, extract their respective features through a shrinking part composed of a convolutional layer for dimensionality reduction (shrinking part, encoder); and then go through a deconvolution part to increase the dimension (amplification part, decoder).
-
Shrink part of the network structure:
-
(1) Option 1 FlowNetSimple: directly merge the two input images into one h × w × 6 h\times w\times 6h×w×6 as the input of the convolutional layer.
-
(2) Scheme 2 FlowNetCoor: Convolve the two input images separately to obtain image features, and then perform correlation calculations (a simple operation is: multiply and sum the corresponding position pixels, the larger the value obtained, the more related, i.e. the images are closer together.) to combine information.
-
Amplification part: Dimensionality is increased by deconvolution. The input of each layer of deconvolution operation includes three parts. The first part contains layer semantic information, the second part contains low-level local information, and the third part is the output of the previous layer convolution. The optical flow of is obtained by upsampling.
-
In the paper, the author uses Euclidean distance to define L oss LossL oss , call this errorEPE ( E nd − P oint − E rror ) EPE(End-Point-Error)EPE ( E n d−Point−Error)
2.FlowNet2.0
Paper address: https://arxiv.org/abs/1612.01925
-
Main optimization strategy: network stacking
-
The input of subsequent FlowNet is not just two pictures ( I mage 1 Image1Image1和 I m a g e 2 Image2 I ma g e 2 ), also includes the optical flow estimation Flow of the previous network input, and a Warped map, plus a brightness error (Brightness Error). Among them, the Warped picture is to apply the estimated optical flow toI mage 2 Image2On I mage 2 , i.e. using the estimated per-pixel offset, offset I mage 2 Image2I ma g e 2 every pixel to make it withI mage 1 Image1I ma g e 1 alignment. Although the optical flow offset is applied, due to the inaccurate estimation of the optical flow, the Warped diagram andI mage 1 Image1I ma g e 1 still has a certain deviation,I mage 1 Image1The BrightnessError map can be obtained by subtracting the Brightness of the Warped map from the brightness of I ma g e 1 .
Summarize
- Lucas-Kanade is a classic and effective algorithm for sparse optical flow.
- For the estimation of dense optical flow, traditional methods need to make a trade-off between accuracy and speed, while FlowNet2 can achieve better accuracy and estimation results. (At that time, FlowNet2 became SOTA in the field of optical flow estimation)
- At present, optical flow estimation is of course also relying on the top stream Transformer Transformer in the cv worldTransformer, F l o w F o r m e r ( A T r a n s f o r m e r A r c h i t e c t u r e f o r O p t i c a l F l o w ) FlowFormer(A Transformer Architecture for Optical Flow) Fl o wF or m er ( A T r an s for or m er A rc hi t ec t u re f or Opti c a lFl o w ) has become the current SOTA .
Tucao: CV is too curly