Basic concepts and typical algorithms of optical flow

Basic concepts and typical algorithms of optical flow



What is optical flow?

  • Optical flow is the instantaneous speed of the pixel movement of the space moving object on the observation imaging plane. It uses the change of pixels in the image sequence in the time domain and the correlation between adjacent frames to find the existence of the previous frame and the current frame. A method to calculate the motion information of objects between adjacent frames. Generally speaking, optical flow is generated due to the movement of the foreground object itself in the scene, the movement of the camera, or the joint movement of both. reference blog
  • Optical flow expresses the change of the image, and because it contains the information of the target's motion, it can be used by the observer to determine the motion of the target. The optical flow field can be derived from the definition of optical flow, which refers to a two-dimensional (2D) instantaneous velocity field composed of all pixels in the image, where the two-dimensional velocity vector is the three-dimensional velocity vector of the visible points in the scene on the imaging surface projection. Therefore, optical flow not only contains the motion information of the observed object, but also contains rich information about the three-dimensional structure of the scene. Baidu Encyclopedia
  • Optical flow is currently used in many fields, including: motion-based 3D reconstruction, video compression, object tracking, and behavior recognition.

1. Traditional classic optical flow algorithm: Lucas-Kanade

Basic assumptions:
(1) Assumption of constant brightness: that is, the optical flow to be estimated has the same brightness of the same object in two frames of images (
2) Similar assumption of neighborhood optical flow: centered on the pixel point (x, y), It is set that the optical flow values ​​of all pixels in the neighborhood of n*n are consistent. (n will not be very large)

  • For assumption (1), we can get:
    I ( x , y , t ) = I ( x + u , y + v , t + Δ t ) I(x,y,t)=I(x+u,y +v,t+ \Delta _t)I(x,y,t)=I(x+u,y+v,t+Dt)
    to Taylor expansion on the right side of the equation, that is:
    I ( x + u , y + v , t + Δ t ) = I ( x , y , t ) + I x ′ u + I y ′ v + I t ′ Δt I(x+u,y+v,t+\Delta_t)=I(x,y,t)+I'_xu+I'_yv+I'_t\Delta_tI(x+u,y+v,t+Dt)=I(x,y,t)+Ixu+Iyv+ItDt
    也就是说I ( x , y , t ) + I x ′ u + I y ′ v + I t ′ Δ t = I ( x , y , t ) I(x,y,t)+I'_xu+I '_yv+I'_t\Delta_t=I(x,y,t)I(x,y,t)+Ixu+Iyv+ItDt=I(x,y,t )
    所以:I x ′ u + I y ′ v + I t ′ Δ t = 0 I'_xu+I'_yv+I'_t\Delta_t=0Ixu+Iyv+ItDt=0,即 [ I x ′ , I y ′ ] [ u , v ] T = − Δ I t [I'_x,I'_y][u,v]^T=-\Delta I_t [Ix,Iy][u,v]T=- I _t

where I x ′ , I y ′ I'_x,I'_yIx,IyRespectively represent at the pixel point ( x , y ) (x,y)(x,y ) , the brightness of the image is atx , yx,yx,The partial derivative in the y direction, that is, the gradient of the image brightness. I t ′ I'_tItis the partial derivative of image brightness with respect to time, and I t ′ Δ t I'_t \Delta_tItDtIndicates ( x , y ) (x,y) between two pictures (usually adjacent frames)(x,y ) The amount of change in the brightness of the pixel at the coordinate position can be expressed asΔ I t = I t ′ Δ t \Delta I_t = I'_t\Delta _tI _t=ItDt u , v u,v u,v is the optical flow value to be estimated.
I x ′ , I y ′ , Δ I t I'_x,I'_y,\Delta I_tIx,Iy,I _tBoth can be obtained by direct calculation on the image. while u , vu,vu,There is only one equation to obtain v , and the solution cannot be completed.

  • For assumption (2), all pixel values ​​in the given neighborhood satisfy the equation I x ′ u + I y ′ v + I t ′ Δ t = 0 I'_xu+I'_yv+I'_t\Delta_t= 0Ixu+Iyv+ItDt=0,由此我们可以将矩阵进行扩充得到:
    [ I x ′ ( 1 ) I y ′ ( 1 ) I x ′ ( 2 ) I y ′ ( 2 ) . . . . . . I x ′ ( n ) I y ′ ( n ) ] [ u , v ] T = [ − Δ I t ( 1 ) − Δ I t ( 2 ) . . . − Δ I t ( n ) ] \left [ \begin{array}{l} I^{'(1)}_x & I^{'(1)}_y \\ I^{'(2)}_x & I^{'(2)}_y \\ ...&...\\ I^{'(n)}_x & I^{'(n)}_y \\ \end{array} \right] [u,v]^T= \left[ \begin{array}{l} -\Delta I^{(1)}_t \\ -\Delta I^{(2)}_t \\ ...\\ -\Delta I^{(n)}_t \\ \end{array} \right] Ix(1)Ix(2)...Ix(n)Iy(1)Iy(2)...Iy(n) [u,v]T= - I _t(1)- I _t(2)...- I _t(n)

Immediately: A x = b Ax=bAx=b . For this set of equations, the method of least squares can be used( x = ( ATA ) − 1 AT b ) (x=(A^TA)^{-1}A^Tb)(x=(AT A)1AT b)Approximate estimate[ u , v ] [u, v][u,v ] solution. But note that( ATA ) (A^TA)(AT A)is reversible if desired.

2. Optical flow algorithm based on neural network: FlowNet&FlowNet2.0

1.FlowNet

Paper address: https://arxiv.org/abs/1504.06852v2

  • Basic idea: use neural network E nd − to − E nd End-to-EndEndtoThe advantages of E n d , input two pictures of adjacent frames, extract their respective features through a shrinking part composed of a convolutional layer for dimensionality reduction (shrinking part, encoder); and then go through a deconvolution part to increase the dimension (amplification part, decoder).
    insert image description here

  • Shrink part of the network structure:

  • (1) Option 1 FlowNetSimple: directly merge the two input images into one h × w × 6 h\times w\times 6h×w×6 as the input of the convolutional layer.
    insert image description here

  • (2) Scheme 2 FlowNetCoor: Convolve the two input images separately to obtain image features, and then perform correlation calculations (a simple operation is: multiply and sum the corresponding position pixels, the larger the value obtained, the more related, i.e. the images are closer together.) to combine information.
    insert image description here

  • Amplification part: Dimensionality is increased by deconvolution. The input of each layer of deconvolution operation includes three parts. The first part contains layer semantic information, the second part contains low-level local information, and the third part is the output of the previous layer convolution. The optical flow of is obtained by upsampling.

  • In the paper, the author uses Euclidean distance to define L oss LossL oss , call this errorEPE ( E nd − P oint − E rror ) EPE(End-Point-Error)EPE ( E n dPointError)

2.FlowNet2.0

Paper address: https://arxiv.org/abs/1612.01925

  • Main optimization strategy: network stacking
    insert image description here

  • The input of subsequent FlowNet is not just two pictures ( I mage 1 Image1Image1 I m a g e 2 Image2 I ma g e 2 ), also includes the optical flow estimation Flow of the previous network input, and a Warped map, plus a brightness error (Brightness Error). Among them, the Warped picture is to apply the estimated optical flow toI mage 2 Image2On I mage 2 , i.e. using the estimated per-pixel offset, offset I mage 2 Image2I ma g e 2 every pixel to make it withI mage 1 Image1I ma g e 1 alignment. Although the optical flow offset is applied, due to the inaccurate estimation of the optical flow, the Warped diagram andI mage 1 Image1I ma g e 1 still has a certain deviation,I mage 1 Image1The BrightnessError map can be obtained by subtracting the Brightness of the Warped map from the brightness of I ma g e 1 .


Summarize

  • Lucas-Kanade is a classic and effective algorithm for sparse optical flow.
  • For the estimation of dense optical flow, traditional methods need to make a trade-off between accuracy and speed, while FlowNet2 can achieve better accuracy and estimation results. (At that time, FlowNet2 became SOTA in the field of optical flow estimation)
  • At present, optical flow estimation is of course also relying on the top stream Transformer Transformer in the cv worldTransformer F l o w F o r m e r ( A T r a n s f o r m e r A r c h i t e c t u r e f o r O p t i c a l F l o w ) FlowFormer(A Transformer Architecture for Optical Flow) Fl o wF or m er ( A T r an s for or m er A rc hi t ec t u re f or Opti c a lFl o w ) has become the current SOTA .
  • Tucao: CV is too curly

Guess you like

Origin blog.csdn.net/weixin_49513223/article/details/127624271