[Paper notes] On Building an Accurate Stereo Matching System on Graphics Hardware

- colleague recommended a paper, want it and SGBM opencv source study notes written together, because it feels pipeline is really like, they might have used scanline optimization reasons for it:)

Foreword

  1. Wen pointed out that the top 10 in the middleBury, basically the kind of the more heavyweight algorithm (CC, CA, optimization, refine) in series, the effect is all very well, and need to use the trick or design complex data structures in order to accelerate using the GPU, for use on a mobile device, performance worrying;

  2. The author of several lightweight modules combined together, can be accelerated in the GPU, the effect is pretty good, making use as possible on a mobile device. In fact, pipline relatively normal: AD Census + CBCA + Scanline Optimization + conventional Refine, respectively, from the following literature references:

  • AD Census:
    X. Sun, X. Mei, S. Jiao, M. Zhou, and H. Wang. Stereo matching with reliable disparity propagation. In Proc. 3DIMPVT, pages 132–139, 2011.
  • CBCA:
    K. Zhang, J. Lu, and G. Lafruit. Cross-based local stereo matching using orthogonal integral images. IEEE TCSVT, 19(7):1073–1079, 2009
  • Scanline Optimization:
    H. Hirschm¨uller. Stereo processing by semiglobal matching and mutual information. IEEE TPAMI, 30(2):328–341, 2008.

Summary

paper links

paper

Source link

code

main idea

Cost calculation (Cost Computation, CC): AD + Census, namely: the absolute differences (AD) measure and the census transform

Consideration of polymerization (Cost Aggregation, CA): CBCA, i.e., a polymerization method based on the cost of the window crossed

Solving parallax (disparity computation): Scanline optimization + WTA, is conventional, in reference scan line optimization SGM + winner takes all

After treatment (post process, refine): + region interpolation vote + + sub-pixel edge enhancement optimization

Contribution point

On the construction of the above-mentioned pipeline, based on the GPU version cuda achieve

Related research

algorithm

income bracket

Digression:

R. Zabih and J. Woodfill. Non-parametric local transforms for computing visual correspondence. In Proc. ECCV, pages 151–158, 1994.

This paper mentions a cost calculation methods, including AD, SGBM inside with BT, Census the like, to be read when it is empty;

H. Hirschm¨uller and D. Scharstein. Evaluation of stereo matching costs on images with radiometric differences.
IEEE TPAMI, 31(9):1582–1599, 2009.

Daniel H. Hirschm¨uller do a feature to calculate the cost of evaluation, the following conclusions:

census shows the best overall results in local and global stereo matching methods

Census thought:

For the left of point \ (\} P = {VEC left of I_} {(X, Y) \) , the parallax level \ (D \) under, \ (Census C_ {} (\ VEC {P}, D ) \) is defined as the midpoint of the left \ (\ vec {p} \ ) as the center, census codes in the entire neighborhood window and in the right midpoint of the center \ (\ vec {pd} = I_ {right} (xD, Y) \) , the Hamming distance between the coded census neighbor window, https://blog.csdn.net/qq_40313712/article/details/86349363 is a relatively straightforward explanation profile.

census advantages - the relationship between the relative values ​​of the pixel points in the neighborhood of the current point and the encoding them, is robust to noise and outliers distortion introduced

Census encodes local image structures with relative orderings of the pixel intensities other than the intensity values themselves, and therefore tolerates outliers due to radiometric changes and image noise

insufficient census - indistinguishable neighborhood point and the current point is similar to the region, such as: repeating texture scene;

However, this asset could also introduce matching ambiguities in image regions with repetitive or similar local structures.

Census To solve the problem, proposed for use in conjunction with AD:

AD is defined as follows:

\[C_{AD}(\vec{p}, d) = \frac{1}{3}\sum_{i=R, G, B}|I_i^{left}(\vec{p}) - I_i^{right}(\vec{pd})| \tag{1}\]

AD Census defined as follows:

\ [C_ {income} = \ rho (C_ {money} (\ vec {p} d) \ lambda_ {money}) + \ rho (C_ {AD} (\ vec {p} d) \ lambda_ to {}) \ tag {2} \]

Wherein, \ (\ Rho (C, \ the lambda) \) is defined as follows:

\(\rho(c, \lambda) = 1- exp(-\frac{c}{\lambda}) \tag{3}\)

A simple comparison below:

CBCA

Consideration of the polymerization can be seen as the price of a cube made refine process, the polymerization is legendary for reducing the influence of noise on the parallax calculation later, CBCA name is Cross-based Cost Aggregation, essentially is a kind of cross-type windows mean filter idea is as follows:

that

From FIG CBCA can be summarized in two steps:

  1. cross window construction
  2. Mean filter within the window

cross window construction

cross window, the window is the cross, there will be about their upper and lower bounds on the left boundary to the distance between the current point, that is, the text says the "left arm", up, down, and right is also true;

Each cross-point will have their own window, the cost for implementing the following polymerization;

Now the question is, how to determine the vertical and horizontal boundaries, is the core issue of cross window structure to be solved.

Throughout the resources available, the following information may have to use:

  • Between the current color point and the points in the neighborhood distance \ (D_c (\ vec {p_1 }, \ vec {p}) = max_ {i = R, G, B} | I_i (\ vec {p_1}) - I_i ( \ vec {p}) | \ )
  • The space between the current point and the points in the neighborhood distance \ (D_s (\ vec {p_1 }, \ vec {p}) = | \ vec {p1} - \ vec {p} | \)

Determining whether a point in the neighborhood of the current window cross points of several criteria are defined as follows:

  1. \ (D_c (\ vec {p_1} \ vec {p}) <\ tau_1 \)同时\ (D_c (\ vec {p_1} \ vec {p_1} + (1, 0)) <\ tau_1 \)
  2. \ (D_S (\ begin {p_1} \ case {p}) <L1 \)
  3. If \ (L2 <D_s (\ vec {p_1}, \ vec {p}) <L1 \) must satisfy: \ (D_c (\ P_1 VEC {}, \ VEC {P}) <\ tau_2 \)

1是保证不跨边缘做聚合,需要当前点和邻域点之间的颜色距离小于\(\tau_1\),同时,该邻域点和它往外一点之间的颜色距离也要小于\(\tau_1\);

2以及3,是为了cover住无纹理场景足够的平滑性,通过设定一个较大的\(L_1\)来对无纹理场景保证有足够大的窗口,如果窗口半径大于预设的半径\(L2\),则需满足\(D_c(\vec{p_1}, \vec{p}) < \tau_2\)才可以。

窗口内的均值滤波

一些先验知识:

  • 均值滤波基于积分图来实现可以达到O(n)的复杂度,并实现radius-free;

  • 分离为一维均值滤波行列方向交替进行,会比直接算二维均值滤波快;

在上面构造好cross窗口后,构造代价立方体的积分图,根据上一步计算得到的窗口半径即可实现cross型窗口的代价聚合;

由于cross型窗口均值滤波并非常规均值滤波,在经历行列交替的一维滤波后,通常会存在条状artifacts,因此,便有了作者下边的这部分迭代操作:

  • 做4次迭代,一次迭代包含行方向和列方向;
  • 1,3次迭代先行后列
  • 2,4次迭代先列后行

引用一句作者的评价:

our improved method are presented in Figure 3, which shows that the enhanced cross construction rules and aggregation strategy can produce more accurate results in large textureless regions and near depth discontinuities.

​ 对于一个研究滤波起家的人来看,这简直就是传说中的保边平滑性嘛,哈哈

Scanline Optimization

扫描线优化是从SGM原文里边引用的,核心思想可借用SGM原文的公式来表述:

\[E(D)=\sum_\vec{p}(C(\vec{p}, D_p) + \sum_{\vec{q}\isin{N_p}}P1T[|D_p-D_q|=1]+\sum_{\vec{q}\isin{N_p}}P2T[|D_p-D_q|>1])\tag{4}\]

为了得到一个全局最优结果,通常会构建一个包含数据项和平滑项的能量函数,通过最小化能量函数值来求得最优视差解。代价立方体本身存在一个代价,对于邻域内的像素点和当前点视差相差1的点做P1的惩罚,大于1的点做P2的惩罚。因此,第一项即为数据项,后面两项则为平滑项。

但由于在立体匹配中,代价立方体维度都会很大,想要求解这个最优化问题,直接用传统方法,内存和耗时都会很大。于是,SGM原文提出可使用动态规划求出近似解,代价函数简化成了下式:

\[C_\vec{r}=C_1(\vec{p}, d) + min(C_\vec{r}(\vec{p}-\vec{r}, d), C_\vec{r}(\vec{p}-\vec{r}, d \pm 1) + P1, min_kC_\vec{r}(\vec{p} - \vec{r} , k) + P2) - min_kC_\vec{r}(\vec{p}-\vec{r}, k)\tag{5}\]

定义 \(D_1 = d_c(\vec{p}, \vec{p}-\vec{r})\)\(D_2 = D_c(\vec{pd}, \vec{pd}-\vec{r})\), 通过\(D_1\)\(D_2\)的值来控制\(P_1, P_2\)的值:

  1. \(P_1 = \Pi_1, P_2 = \Pi_2, if D_1 < \tau_{SO}, D_2 < \tau_{SO}\)
  2. \(P_1 = \Pi_1 / 4, P_2 = \Pi_2 / 4, if D_1 < \tau_{SO}, D_2 > \tau_{SO}\)
  3. \(P_1 = \Pi_1 / 4, P_2 = \Pi_2 / 4, if D_1 > \tau_{SO}, D_2 < \tau_{SO}\)
  4. \(P_1 = \Pi_1 / 10, P_2 = \Pi_2 / 10, if D_1 > \tau_{SO}, D_2 > \tau_{SO}\)

最终的代价立方体结果为: \(C_2(\vec{p}, d) = \frac{1}{4}\sum_\vec{r}C_\vec{r}(\vec{p}, d)\)

最终的视差图用WTA即可得到,即:对每个像素点,取代价最小的那个视差值作为当前点视差

Post-process & Refine

经由WTA计算得到的视差图通常在遮挡区域存在着无效视差,同时也会存在一些噪声点甚至是大面积出错的情况,作者用了以下几个模块来做refine:

  1. Outlier Detection

    • 这部分主要是基于左右视差图的一致性检测来提取出视差不准确区域,简单而粗暴的设一个阈值,左右图匹配点对应的左右视差大于某个阈值,则该点即为视差不准确点;

    • 视差不准确点可分为两种:1)遮挡点;2)误匹配点

    • 对于上述两种点的区分,作者使用的是SGM原文里的方法:
      1) 在极线上搜索,若在右视差图上找到相交的点,则该点为误匹配点

      2) 在极线上右视差图上没有搜索到相交的点,则该点为遮挡点

  2. Iterative Region Voting

    对于上一步检测出来的outlier,统计其cross窗口中可靠点的视差直方图\(H_\vec{p}\),分为\(d_{max}+1\)个bins,对于可靠点数最多的bin即为\(d_\vec{p}^*\),记所有可靠像素点数为\(S_\vec{p} = \sum_{d=0}^{d=d_{max}}H_\vec{p}(d)\),当\(H_\vec{p}\)满足以下条件时:

    \(S_\vec{p} > \tau_S, \frac{H_\vec{p}(d_\vec{p}^*)}{S_\vec{p}} > \tau_H\)

    outlier处的视差由\(d_\vec{p}^*\)替代,五次迭代后,错误区域会少很多

  3. Proper Interpolation

    经由上一步之后,错误点会少很多。这一步会对误匹配点&遮挡点区别对待;

    搜寻outliter周围最相近的16个方向上的可靠点;

    对于遮挡点,选取其中最小的视差最为其视差值;

    对于误匹配点,选取和其最相似的那个点的视差作为其视差值;

  4. Depth Discontinuity Adjustment

    Step 1: 检测视差图的边缘

    Step 2: 判断边界点两边点的代价值,如果低于当前点的代价值,则调整当前边界点的视差至边界两边的那个点

  5. Sub-pixel Enhancement

    ——基于quadratic polynomial interpolation的亚像素增强方法

image-20200113230301596

后面再跟一个3x3的中值滤波

refine各个stage带来的收益:

实验

测试平台

PC with Core2Duo 2.20GHz CPU and NVIDIA GeForce GTX 480 graphics card

数据集

Middlebury

参数设置

性能

Tsukuba Venus Teddy Cones
CPU 2.5s 4.5s 15s 15s
GPU 0.016s 0.032s 0.095s 0.094s

tips

调参很重要

后记——SGBM OpenCV源码学习笔记

在opencv源码的位置:

opencv-4.0.0_src\opencv-4.0.0\modules\calib3d\src\stereosgbm.cpp

推荐博客:

https://blog.csdn.net/wwp2016/article/details/86080722

https://zhuanlan.zhihu.com/p/53060518

pipeline

pipeline很符合常规立体匹配pipeline:代价计算 -> 代价聚合 -> 视差计算 -> 视差后处理,和上面paper的逻辑极为相近,可以串起来做相关优化和尝试,所以把他们写在了一篇文档里。

函数调用与参数解析

SGBM opencv源码将上述pipeline封装为两个接口供外界调用:1)通过带参数设置的构造函数设置参数;2)算法核心;

    StereoSGBMImpl()
    {
        params = StereoSGBMParams();
    }

    StereoSGBMImpl( int _minDisparity, int _numDisparities, int _SADWindowSize,
                    int _P1, int _P2, int _disp12MaxDiff, int _preFilterCap,
                    int _uniquenessRatio, int _speckleWindowSize, int _speckleRange,
                    int _mode )
    {
        params = StereoSGBMParams( _minDisparity, _numDisparities, _SADWindowSize,
                                   _P1, _P2, _disp12MaxDiff, _preFilterCap,
                                   _uniquenessRatio, _speckleWindowSize, _speckleRange,
                                   _mode );
        // _minDisparity: 最小视差,默认为0; 对于实际应用中可能存在负视差的场景,可适当是设置为负视差,e.g. -3;
        
        // _numDisparities: 视差层级,需设置为16的倍数,默认为16;
        //                注意:输出视差图的视差层级为(16 * _minDisparity + _numDisparities);
        
        // _SADWindowSize: SAD窗口直径,代价聚合部分,形同boxfilter的窗口直径,默认为5;
        
        // _P1: 邻域点视差和当前点相差1的惩罚量,即上面eq4的P1,默认为2;
        
        // _P2: 邻域点视差和当前点相差大于1的惩罚量,即上面eq4的P2, 默认:max(params.P2 > 0 ? params.P2 : 5, P1+1);
        
        // _disp12MaxDiff: LR Check中不匹配点检测的阈值,默认:1;
        
        // _preFilterCap: SGBM 代价计算是基于BT特征的,这个值是将每个匹配点对的匹配代价值抑制在_preFilterCap - 2 * _preFilterCap范围内;
        //                 注意:该值会影响到匹配代价的数量级,P1, P2的调参需要和该值协同修改;
        //                 BT的大概思想:
        //                 if k < 256*4 - _preFilterCap, 则CC = _preFilterCap;
        //                 if 256*4 - _preFilterCap < k < 256*4 + _preFilterCap, 则CC = k - 256*4 + _preFilterCap;
        //                 if k > 256*4 + _preFilterCap, 则CC = 2 * _preFilterCap;
        
        // _uniquenessRatio: 唯一匹配检测阈值。要求除了bestDisp前后各一个视差之外,其余视差值对应的S必须大于minS * 1.x,默认为10, 大于1.1;
        
        // _speckleWindowSize & _speckleRange: filterSpeckle的参数,用于对视差对做斑点去除的函数;
        
        // _mode: 
        //       - MODE_SGBM_3WAY:
        //       - MODE_HH4:
        //       - MODE_HH:默认模式
    }

    void compute( InputArray leftarr, InputArray rightarr, OutputArray disparr ) CV_OVERRIDE
    {
        CV_INSTRUMENT_REGION();

        Mat left = leftarr.getMat(), right = rightarr.getMat();
        CV_Assert( left.size() == right.size() && left.type() == right.type() &&
                   left.depth() == CV_8U );

        disparr.create( left.size(), CV_16S );
        Mat disp = disparr.getMat();

        if(params.mode==MODE_SGBM_3WAY)
            computeDisparity3WaySGBM( left, right, disp, params, buffers, num_stripes );
        else if(params.mode==MODE_HH4)
            computeDisparitySGBM_HH4( left, right, disp, params, buffer );
        else
            computeDisparitySGBM( left, right, disp, params, buffer );

        medianBlur(disp, disp, 3);

        if( params.speckleWindowSize > 0 )
            filterSpeckles(disp, (params.minDisparity - 1)*StereoMatcher::DISP_SCALE, params.speckleWindowSize,
                           StereoMatcher::DISP_SCALE*params.speckleRange, buffer);
    }

一些代码细节

SGBM的三个模式

  1. MODE_SGBM_3WAY
  2. MODE_SGBM_HH
  3. MODE_SGBM_HH4

SGBM的内存管理

核心算法思想

  1. 基于BT的代价计算

    ——Birchfield-Tomasi metric

    from BT org paper

    最初是在A Pixel Dissimilarity Measure That Is Insensitive to Image Sampling中由Stan Birchfield and Carlo Tomasi提出,旨在提升相似性度量对image sampling的鲁棒,BT在该paper的Section 2中提出了BT的基本思路,在Section 3从理论上和实验中证明了该算法的有效性,具体的内容没有很清楚,但对BT的主要思想做了下梳理,内容如下:

    对于两幅连续的立体对\(I_L(x_L)\), \(I_R(x_R)\),经过硬件处理、图像采样后,左图和右图匹配点的灰度值可能会出现一些偏差,而BT的思想就是基于简单的线性插值,搜索\(x+1, x-1\)\(I_{min},I_{max}\)来协同表征当前位置下左右视图的相似度:

image-20200121121127286

数学表达如下:

\(I_R^- \equiv \hat{I_R}(x_R - \frac{1}{2}) = \frac{1}{2}(I_R(x_R) + I_R(x_R - 1))\)\(I_R^+ \equiv \hat{I_R}(x_R + \frac{1}{2}) = \frac{1}{2}(I_R(x_R) + I_R(x_R + 1))\)

\(I_{min} = \min{I_R^-, I_R^+, I_R(x_R)}\), \(I_{max} = \max{I_R^-, I_R^+, I_R(x_R)}\)

则cost为\(\bar{d}(x_L, x_R, I_L, I_R) = \max{\{0, I_L(x_L) - I_{max}, I_{min} - I_L(x_L)\}}\)

BT vs AD below shows a specific example:

image-20200121122743104

SGBM of BT

  1. Based on the cost of Scanline Optimization Optimization

    - the cost of polymerization ranging way points in the neighborhood

Call demo

#include "stdafx.h"
#include "opencv2/opencv.hpp
using namespace std;
using namespace cv;
int _tmain(int argc, _TCHAR* argv[])
{
    Mat left = imread("imgL.jpg", IMREAD_GRAYSCALE);
    Mat right = imread("imgR.jpg", IMREAD_GRAYSCALE);
    Mat disp;
    int mindisparity = 0;
    int ndisparities = 64;  
    int SADWindowSize = 11; 
    //SGBM
    cv::Ptr<cv::StereoSGBM> sgbm = cv::StereoSGBM::create(mindisparity, ndisparities, SADWindowSize);
    int P1 = 8 * left.channels() * SADWindowSize* SADWindowSize;
    int P2 = 32 * left.channels() * SADWindowSize* SADWindowSize;
    sgbm->setP1(P1);
    sgbm->setP2(P2);
    sgbm->setPreFilterCap(15);
    sgbm->setUniquenessRatio(10);
    sgbm->setSpeckleRange(2);
    sgbm->setSpeckleWindowSize(100);
    sgbm->setDisp12MaxDiff(1);
    //sgbm->setMode(cv::StereoSGBM::MODE_HH);
    sgbm->compute(left, right, disp);
    disp.convertTo(disp, CV_32F, 1.0 / 16);                //除以16得到真实视差值
    Mat disp8U = Mat(disp.rows, disp.cols, CV_8UC1);       //显示
    normalize(disp, disp8U, 0, 255, NORM_MINMAX, CV_8UC1);
    imwrite("results/SGBM.jpg", disp8U);
    return 0;
}

Guess you like

Origin www.cnblogs.com/nico-1729/p/12233586.html