【论文】Deep3D: Fully automatic 2D-to-3D video conversion with deep convolutional neural networks

论文链接
论文代码(MXNet)

作者避免了先前深度估计数据集的稀少问题,使用3D电影的帧作为训练数据。The author used left view as the input, and trained an end-to-end supervised neural network (the backbone is based on VGG16) to predict the right view.
网络中间部分输出了left view 和 right view的视差图 (disparity map),but this is only used for minimizing the MAE between the predicted right view and the ground truth. Therefore, the “disparity map” is not “real” and “accurate”.

在评分标准中,在采用MAE之外还采用了人工打分的方式。

值得一提的是,intuitively使用temporal dependency的逐帧视频可能会提高模型的预测能力,但是从MAE结果来看在预测的时候加入5帧optical flow会增加MAE。作者给出的解释是temporal dependency嵌入模型的复杂性。我猜测可能还因为是拍摄角度的转变以及有效信息占比的减少(多输入数据的干扰)。

猜你喜欢

转载自blog.csdn.net/yaoyao_chen/article/details/130471834
今日推荐