ICCV 2023 | TUM&Google proposes md4all: monocular depth estimation under challenging conditions

Click the card below to follow the " CVer " official account

AI/CV heavy dry goods, delivered in the first time

Click to enter —>【Depth Estimation】communication group

b8ccc68aaa25371b540ae0af8e3022c7.png

Robust Monocular Depth Estimation under Challenging Conditions

Author list: Stefano Gasperini, Nils Morbitzer, HyunJun Jung, Nassir Navab, Federico Tombari

Translated by: Zhai Guangyao

Unit: Technical University of Munich, Google

foreword

Hello everyone, my name is Stefano Gasperini, and I am here to promote our work on ICCV 2023, more details can be found in our paper: https://arxiv.org/abs/2308.09711, and our project website: https:// md4all.github.io.

Code: https://github.com/md4all/md4all

Reply in the background of the CVer WeChat public account: md4all, you can download the pdf and code of this paper

First of all, please watch such an example:

Can you see the tree in the color picture? 

f8cd072f617e36483d9a9ef7f35cb7da.png

Our monocular depth estimation network outputs reliable depth estimates under all conditions, even in darkness!   

background

While state-of-the-art monocular depth estimation methods achieve impressive results in ideal environments, they are highly unreliable under challenging lighting and weather conditions, such as nighttime or rainy days.

51f89f88c70ac2b03cb4346dbbca447f.png

In these cases, unfavorable factors such as inherent sensor noise, textureless dark areas, and reflections all violate the training assumptions of supervised and self-supervised learning methods. Self-supervised methods cannot establish the correspondence of pixels needed to learn depth, while supervised methods may learn data flaws from sensor truth (such as LiDAR and nuScenes data samples in the above figure).

method

In this paper, we propose md4all to address these security-critical issues. md4all is a simple and effective solution that works reliably under both adverse and ideal conditions, and is applicable to different types of supervised learning.

41b45493bf833ba43d3339a33b3329dd.png

We achieve this by exploiting the ability of existing methods to work in perfect settings. Therefore, we provide an effective training signal independent of the input signal. First, through image transformation, we generate a set of complex samples corresponding to normal training samples. We then guide the network model to either self-supervised learning or fully supervised learning by feeding the generated samples and computing a standard loss on the corresponding raw images.

As shown in the figure above, we further distill knowledge from a pre-trained baseline model that only performs inference under ideal circumstances, while feeding a mixture of ideal and unfavorable inputs to the deep model.

The implementation code of the proposed method is included in our GitHub code repository, welcome to visit: 

https://github.com/md4all/md4all

result

697e5ab0eca94fe4b310d29109d57f02.png

With md4all, we substantially outperform previous solutions, providing robust estimates under a wide range of conditions. It is worth noting that the proposed md4all only uses a monocular model without a dedicated branch.

The figure above shows the prediction results in the challenging environment of the nuScenes dataset. The self-supervised method Monodepth2 cannot extract valuable features due to the darkness of the scene and the influence of noise (first row). Supervised AdaBins learn the imperfections from the sensor data and cause hole predictions on the road (second row). Applied on the same architecture, our md4all improves robustness under both standard and adverse conditions.

c66cfda7bd5a97d43a23875976713553.png

dcf336bcd40e08a566d21b18584f4687.png

In this paper, we show the effectiveness of md4all under two types of supervision under standard and unfavorable conditions. Through extensive experiments on the nuScenes and Oxford RobotCar datasets, md4all performed significantly better than previous works (as shown in the data above).

image conversion

3571fb1c0d20920ffa157c571dfc2265.png

We also show examples of image transformations generated for training md4all (shown above). We perform data augmentation by feeding the model a mixture of original and transformed samples. Such a model can recover information under different conditions without modification at inference time.

Here, we open-source share all adverse-condition generated images corresponding to sunny and cloudy samples from the nuScenes and Oxford Robotcar training sets. Welcome to visit: 

https://forms.gle/31w2TvtTiVNyPb916

These images can be used in future robust methods for depth estimation or other tasks.

Reply in the background of the CVer WeChat public account: md4all, you can download the pdf and code of this paper

Click to enter —> [Target Detection and Transformer] Exchange Group

ICCV/CVPR 2023 Paper and Code Download

 
  

Background reply: CVPR2023, you can download the collection of CVPR 2023 papers and code open source papers

后台回复:ICCV2023,即可下载ICCV 2023论文和代码开源的论文合集
目标检测和Transformer交流群成立
扫描下方二维码,或者添加微信:CVer333,即可添加CVer小助手微信,便可申请加入CVer-目标检测或者Transformer 微信交流群。另外其他垂直方向已涵盖:目标检测、图像分割、目标跟踪、人脸检测&识别、OCR、姿态估计、超分辨率、SLAM、医疗影像、Re-ID、GAN、NAS、深度估计、自动驾驶、强化学习、车道线检测、模型剪枝&压缩、去噪、去雾、去雨、风格迁移、遥感图像、行为识别、视频理解、图像融合、图像检索、论文投稿&交流、PyTorch、TensorFlow和Transformer、NeRF等。
一定要备注:研究方向+地点+学校/公司+昵称(如目标检测或者Transformer+上海+上交+卡卡),根据格式备注,可更快被通过且邀请进群

▲扫码或加微信号: CVer333,进交流群
CVer计算机视觉(知识星球)来了!想要了解最新最快最好的CV/DL/AI论文速递、优质实战项目、AI行业前沿、从入门到精通学习教程等资料,欢迎扫描下方二维码,加入CVer计算机视觉,已汇集数千人!

▲扫码进星球
▲点击上方卡片,关注CVer公众号

It's not easy to organize, please like and watch3334f182df61dd6319b6ea4fed13c70f.gif

Guess you like

Origin blog.csdn.net/amusi1994/article/details/132644857