OmniMVS拜读：End-to-End Learning for Omnidirectional Stereo Matching

图像来源

1.将输入的鱼眼图像提取为unary feature maps
- 通过2D CNN实现
  - 文中使用SegNet+dilated convolution
2.利用feature maps和内外参建立4D feature volume
- 通过 calibration + spherical sweeping实现
  - 文中使用multi-fisheye camera rig模型和 spherical sweeping方法
3.计算matching cost volume
- 通过3D CNN正则化
4.depth estimate
- 使用softargmin完成

在这里插入图片描述

在这里插入图片描述
在多相机的中心，使用单位向量 $\vec{p}$ 表示整个rig的朝向：

rig向量 $\vec{p}$ 指向的点集构成一个球体

通过设置camera rig模型中 $\vec{p}$ 的长度ρ（球半径），可以获得不同大小的球体

设置N个球体，与鱼眼图（实际用的是对应的feature map）有映射关系：在这里插入图片描述

2D CNN 获取的unary feature map表示为： $U=F_{CNN}(I)$

Feature maps 通过上述的spherical sweeping方法warp到球上：在这里插入图片描述
warping the feature maps具体使用:

对N个球体，为了确保相邻warped feature maps有足够的disparities，并减少运算开销

此外

在这里插入图片描述

首先输入鱼眼灰度图，经过2D CNN 获得原图一半大小的feature map
随后feature aligned by spherical sweeping，通过 $3 \times 3 c o n v$ transferred to
spherical feature ，将球面特征映射串联并通过 $3 \times 3 \times 3 c o n v$ 融合为cost volume
cost volume再通过 3D encoder-decoder 来 refine 和 regularize
最后，应用softargmin获取逆深度：

为了以端到端的方式训练网络，使用输入图像和ground truth inverse depth作为输入

loss为预测逆深度和其ground truth的absolute error loss ：在这里插入图片描述