Hierarchical Features Driven Residual Learning for Depth Map Super-Resolution 2019 TIP论文阅读

Abstract

Abstract Rapid development of affordable and portable consumer depth cameras facilitates the use of depth information in many computer vision tasks such as intelligent vehicles and 3D reconstruction. However, depth map captured by low-cost depth sensors (e.g., Kinect) usually suffers from low spatial resolution, which limits its potential applications. In this paper, we propose a novel deep network for depth map super-resolution (SR), called DepthSR-Net. The proposed DepthSR-Net automatically infers a high-resolution (HR) depth map from its low-resolution (LR) version by hierarchical features driven residual learning. Specifically, DepthSR-Net is built on residual U-Net deep network architecture. Given LR depth map, we first obtain the desired HR by bicubic interpolation upsampling and then construct an input pyramid to achieve multiple level receptive fields. Next, we extract hierarchical features from the input pyramid, intensity image, and encoder decoder structure of U-Net. Finally, we learn the residual between the interpolated depth map and the corresponding HR one using the rich hierarchical features. The final HR depth map is achieved by adding the learned residual to the interpolated depth map. We conduct an ablation study to demonstrate the effectiveness of each component in the proposed network. Extensive experiments demonstrate that the proposed method outperforms the state-of-the-art methods. In addition, the potential usage of the proposed network in other low-level vision problems is discussed.

I. INTRODUCTION

II. RELATED WORK

A. Non Color-Guided Depth Map SR Method

B. Color-Guided Depth Map SR Method

C. Deep Learning-Based Color Image SR Method

Among the previous works, MSG-Net [4] is the most related one to the proposed DepthSR-Net.
However, DepthSR-Net is different from the MSG-Net in the following aspects: 1) Instead of performing an early spectral decomposition, we learn the residual map to avoid the spectral decomposition pre-processing, which is more flexible and suitable for practical applications; 2) Different from direct applying LR depth map as input, we first upscale it to the desired solution by bicubic interpolation, which relaxes the constraint on the size of output. In other word, the proposed DepthSR-Net can process any scaling factors while the MSG-Net only generalizes to 2N scaling factors due to the constraint of automatic upsampling operation utilized in the MSG-Net; 3) Compared with the MSG-Net, we make full use of the multi-level features extracted from input pyramid to recover HR depth map; 4) Although both MSG-Net and DepthSR-Net employ the intensity image as guidance, they extract intensity features by different network architectures. We acknowledge that the features extracted from the intensity image can boost the performance of depth map SR and furtherd emonstrate this conclusion in our ablation study.
然而，DepthSR-Net与MSG-Net的不同之处在于:1)我们不进行早期的光谱分解，而是学习残差映射以避免光谱分解预处理，这更灵活，更适合实际应用;2)与直接应用LR深度图作为输入不同，我们首先通过双三次插值将其提升到期望的解，从而放宽了对输出大小的限制。也就是说，本文提出的深度网格可以处理任何尺度因子，而由于MSG-Net中使用的自动上采样操作的限制，MSG-Net只能处理2N个尺度因子;3)与MSG-Net相比，我们充分利用了从输入金字塔中提取的多层次特征来恢复HR深度图;4)虽然MSG-Net和DepthSR-Net都使用强度图像作为指导，但它们根据不同的网络架构提取强度特征。我们承认从强度图像中提取的特征可以提高深度图SR的性能，并在消融研究中进一步证明了这一结论。

III. PROPOSED METHOD

In this part, we first briefly formulate the problem that this paper focuses on, and then illustrate the details of the proposed DepthSR-Net architecture. At last, we present the loss function, and training and implementation details.

A. Problem Formulation

Following the conclusion proposed in [44], when the original mapping is more like an identity mapping, the residual mapping will be much easier to be optimized.
ccordingly, we learn the residual between the interpolated depth map and the corresponding HR depth map that is the missed high-frequency component in the process of bicubic interpolation upsampling.

B. Proposed DepthSR-Net Architecture

The overview of the proposed network architecture and parameter settings is shown in Figure 2.
在这里插入图片描述
• input pyramid branch that achieves multiple level receptive fields and produces hierarchical representation;
• encoder branch that concatenates the hierarchical features from input pyramid and produces a set of hierarchical encoder features;
• hierarchical Y guidance branch that extracts hierarchical intensity features to transfer useful structure to the final HR depth map;
• skip connections that transmits the encoder features to decoder path;
• decoder branch that produces the residual map by fusing rich hierarchical concatenated features.
•输入金字塔分支，实现多级接受域，产生层次表示;
•编码器分支，连接来自输入金字塔的层次特征，并产生一组层次编码器特征;
分级指导分支，提取分级强度特征，将有用的结构转化为最终的HR深度图;
跳过将编码器特性传输到解码器路径的连接;
•解码器分支，通过融合丰富的层次级联特性来生成剩余映射。

Input pyramid branch has following advantages: (1) providing hierarchical feature representation extracted from input depth map; (2) achieving multiple level receptive fields; (3) reducing the risk of over-fitting by providing an abstract form of the representation.
输入金字塔分支具有以下优点:(1)提供从输入深度图中提取的层次特征表示;(2)实现多层次的接受域;(3)通过提供表示的抽象形式来降低过拟合的风险。

C. Loss Function

D. Network Training and Implementation

IV. EXPERIMENTS

A. Experiment on Middlebury Dataset

B. Experiment on Test-ToF Dataset

C. Experiment on Test-Ynoise Dataset

D. Experiment on Real Data

E. Running Time

F. Ablation Study

V. APPLICATION

VI. DISCUSSION AND CONCLUSION

h_l_dou

发布了22 篇原创文章 · 获赞 47 · 访问量 11万+

私信关注