论文解读2-Displacement-Invariant Matching Cost Learning for Accurate Optical Flow Estimation

Paper address: https://arxiv.org/pdf/2010.14851.pdf

github address: https://github.com/jytime/DICL-Flow    (The trained model has not been released yet, and there is no training code yet, only the model building code)

Motivation:

  1. Learning Matching Costs has achieved great success in binocular matching by constructing 3D Cost Volume (FxDxHxW) and 3D Conv; however, if this set is used directly on optical flow, 4D Cost Volume (FxUxVxHxW) will be constructed. And the need to use 4D Conv, such operations are basically not allowed on current equipment (high computing power requirements and large video memory required). DICL (displacement-invariant cost learning) The papers presented by a layer of net on each Convolution-based matching 2D U (displacement), respectively, to learn the U cost of processing can be performed 4D Cost Volume.
  2. The probability of each u is obtained through Learning Matching Costs , and the optical flow value that can reach sub-pixel accuracy is obtained through soft-max; this operation is very successful in the single-peak probability distribution, but because of occlusion, it is low Texture, repeated texture and image blur will cause the probability distribution to be multi-peak, so that the obtained optical flow value will be worse (the average result will be worse than Winner-Take-All (meaning whoever has the highest probability will choose whoever ) Strategy). The DAP (Displacement-Aware Projection) layer proposed in this paper can be successful in such a case.

DICL:

 Let’s take a look at the comparison between DICL and other methods. There are many articles that use traditional dot multiplication or Cosine distance instead of Learning Matching Costs. This effect is more obvious in the above figure. It is not easy to extract the peak value, and the overall performance is relatively flat. ; MLP refers to the use of simple 3-layer convolution to learn Matching Costs. It can be seen that it can indeed improve compared with the traditional method, but it is still not prominent enough; and the peak prominence in DICL is larger, which makes it possible The obtained optical flow has higher precision. The specific DICL operations are as follows:

For each u , concate the feature of src (FxHxW) and the feature of ref offset u (FxHxW) to construct a Fu (2FxHxW):

Then use a 2D convolution-based net ( G() ) to operate on Fu ( weight sharing of 2D convolution-based nets of different u ), and then output a (1xHxW) which can be understood as a cost map on u , Then construct all u together to get the cost volume of (UxVxHxW), the specific formula is as follows:

DAP:

 According to the introduction of Motivation, a great function of DAP is to convert the multi-peak probability distribution into a unimodal probability distribution; as shown in the figure above, the effect of DAP is still very obvious. The operation of this layer is also relatively simple, using a 1x1 conv to weighted average the dimensions of the displacement, the formula is as follows:

I personally think that this weighted average is difficult to deal with all multi-peak situations, and can only handle a small part

Experiments:

This blog recently discussed the role of DICL and DAP layer :

DICL : Reduced DICL means that the DICL network uses 1x1 conv instead, the lower the index, the better

       The effect of using the DICL layer on PWCNet and DICL is as follows:

DAP: Ours-w/o DAP means the result of removing the DAP layer

Comparison of overall renderings:

 

Guess you like

Origin blog.csdn.net/XBB102910/article/details/109412240