The complete project has been open sourced in AI Studio, click the link to run it:
aistudio.baidu.com/aistudio/pr…

01 Introduction

When an unmanned vehicle is driving on the road, it is usually necessary to configure lidar to obtain high-precision point cloud data, and obtain the distance between the main vehicle and surrounding obstacles from the point cloud data. However, the cost of lidar is high, so many scholars try to use cameras to estimate the distance between the main vehicle and the surrounding obstacles, so as to reduce the cost as much as possible.

In technical terms, depth estimation is to collect the image of the object through the image acquisition device, and use the image of the object to estimate the vertical distance from each point of the object to the imaging plane of the image acquisition device. The vertical distance is the depth information of the corresponding point on the object. .

At present, there are many depth estimation methods. If they are divided according to the sensors used, they can be divided into depth estimation based on TOF cameras, binocular cameras and monocular cameras. This article discusses depth estimation based on a monocular camera .

Figure 1 The development history of monocular depth estimation algorithm[1]

02 Algorithm background

In recent years, self-supervised depth estimation has been extensively studied, and Monodepth and SFMLearner are the first self-supervised monocular depth estimation methods using trained deep networks and independent pose networks. Some methods are improved in outdoor scenes and fully tested on KITTI dataset and Cityscape dataset, but the effect is not particularly good at night because of low visibility and uneven lighting at night. Therefore, some scholars have developed a depth estimation method for night scenes. However, depth estimation at night is more difficult than during the day. The multi-spectral migration network MTN uses additional sensors to estimate night depth estimation, uses thermal imaging camera sensors to reduce the impact of low visibility at night, and adds laser sensors in a certain way to improve additional information. Meanwhile, there are also some methods that employ generative adversarial networks for night-time depth estimation.

Although remarkable progress has been made in nighttime monocular depth estimation, the performance of these methods is limited due to the large discrepancy between daytime and nighttime images. To alleviate the problem of performance degradation caused by illumination changes, the ADDS-DepthNet algorithm employs a domain-separated network that divides the information of day-night image pairs into two complementary subspaces: a private domain and an invariant domain. The private domain contains the unique information (lighting, etc.) of the diurnal image, and the invariant domain contains the necessary shared information (textures, etc.). At the same time, in order to ensure that the daytime and nighttime images contain the same information, the domain separation network takes the daytime image and the corresponding nighttime image (generated using the GAN method) as input, and learns the feature extraction of the private domain and the invariant domain through orthogonality and similarity loss , thereby reducing the domain gap and obtaining a better depth map. Finally, the complementary information and depth map are utilized for efficient depth estimation using reconstruction and photometric loss.

One more thing to say here, the so-called private domain can be understood as the "personality" of the feature; while the invariant domain is similar to the "commonality" of the feature.

03 ADDS Algorithm Architecture

The ADDS algorithm proposes a domain separation framework to eliminate the influence of interference. The framework uses the daytime image and the corresponding nighttime image generated by the GAN idea as the input of the network.

Figure 2 ADDS-Net algorithm architecture [2]

The ADDS algorithm architecture includes three parts: a deep network with shared weights (pink structure in the orange area in the middle), a private network during the day (yellow structure in the blue area above) and a private network at night (green structure in the blue area below).

The input to the deep network is daytime and nighttime images with shared weights. The network first extracts invariant features and then estimates the corresponding depth maps. Meanwhile, the daytime private feature extractor and nighttime private feature extractor (blue area) extract daytime and nighttime features respectively, these private features are constrained by orthogonality loss to obtain complementary features, and add private and invariant features to Reconstructs the original input image with a reconstruction loss.

Part-1 model input

For day and night images of the same scene, the depth information should be consistent even though the lighting of these image pairs is very different. This means that the basic information of the daytime image and the nighttime image corresponding to the scene should be similar. The ADDS algorithm divides the information of the daytime and nighttime images into two parts: the invariant information of the daytime and nighttime images (such as the law of the size of the street, etc.), and the private information of the daytime and nighttime images (such as lighting, etc.).

The illumination of the scene varies over time, while the depth of the scene is constant, so the illumination component of the scene plays less role in self-supervised depth estimation.

此外，很难保证场景的真实世界白天和夜间图像包含除了私有信息（照明等）之外的相同信息不变，因为在室外场景中总是有移动的对象，这将误导网络获取图像的私有和不变信息。因此，ADDS算法使用CycleGAN将白天图像转换为夜间图像，这样白天图像和相应生成的夜间图像被视为输入图像对，它确保了不变信息是一致的，并且所有对象都位于相同的位置，从而减少了在分离私有信息的过程中重要信息的丢失。注意，这里也可以使用其他GAN算法。

Part-2 特征提取器

域分离框架将图像分离为特征层中的两个互补子空间，并且将不变分量用于深度估计。

ADDS算法使用两个网络分支分别在特征级别提取图像的私有和不变信息。给定输入的白天图像序列和相应生成的夜晚图像序列，对于输入的白天图像序列，用白天专用特征提取器提取图像的私有和不变信息；同理，对于输入的夜晚图像序列，也有一个专用的特征提取器用来提取图像的私有和不变信息。由于输入的白天和夜间图像包含相同的基本信息，因此两个特征提取器中提取不变信息那部分是权重共享的。

Part-3 深度图的生成及图像的重建

基于上一步得到的特征重建白天和夜间图像的相应深度图。其中，红色解码器表示共享权重的深度网络的深度恢复模块，黄色解码器和绿色解码器表示白天、夜间图像的重建分支。

Part-4 自监督信号

为了以自监督学习的方式获得全天图像的私有和不变特征并很好地估计深度信息，ADDS算法利用了不同的损失，包括重建损失（Reconstruction Loss）、相似性损失（Similarity Loss）、正交性损失（Orthogonality Loss）和光度损失（Photometric Loss）。

网络的总训练损失为：

其中，λ1, λ2, λ3, λ4是权重参数。在ADDS算法中，作者根据经验设置为λ1=0.1, λ2=λ3=λ4=1。

04 基于PaddleVideo快速体验ADDS-DepthNet

PaddleVideo是飞桨官方出品的视频模型开发套件，旨在帮助开发者更好地进行视频领域的学术研究和产业实践。下面将简要介绍快速体验步骤。

安装PaddleVideo

 # 下载PaddleVideo源码
%cd /home/aistudio/
!git clone https://gitee.com/PaddlePaddle/PaddleVideo.git
# 安装依赖库
!python -m pip install --upgrade pip
!pip install --upgrade -r requirements.txt
复制代码

下载训练好的ADDS模型

PaddleVideo提供了在Oxford RobotCar dataset数据集上训练好的ADDS模型，为了快速体验模型效果的开发者可以直接下载。

 # 下载在Oxford RobotCar dataset数据集上训练好的模型参数
!wget https://videotag.bj.bcebos.com/PaddleVideo-release2.2/ADDS_car.pdparams
# 导出推理模型
%cd /home/aistudio/PaddleVideo
!python tools/export_model.py -c configs/estimation/adds/adds.yaml \
                              -p ADDS_car.pdparams \
                              -o inference/ADDS
复制代码

导出的推理模型保存在

 /PaddleVideo/inference
└── ADDS
    ├── ADDS.pdiparams
    ├── ADDS.pdiparams.info
    └── ADDS.pdmodel
复制代码

模型推理

使用PaddleVideo/tools/predict.py加载模型参数，并输入一张图片，其推理结果会默认以伪彩的方式保存下模型估计出的深度图。这里提供了两张测试图片，分别是白天和夜间拍摄的照片，拍摄设备是大疆Osmo Action灵眸运动相机。以下是测试图片和对应的预测深度图：

从测试结果来看，我个人觉得深度图的表现效果在白天相对更好，在晚上则会弱一些，不过也有可能是晚上拍出来的图像质量较差，且环境较暗。但毕竟是基于自监督学习所作，所以结果还不错。

05 总结

论文最后展示了比较有意思的量化结果，这里给大家展示一下：

图3 模型卷积层特征图可视化[2]

这张图展示的是卷积层的特征图可视化。从上到下分别是：（a）白天私有特征；（b）夜间私有特征；（c）白天共有特征；（d）夜间共有特征。第一列显示了相应的输入图像，从左到右的其余列是包含更多信息的前10个特征图。首先看输入，这里展示的图像，不管是白天还是夜间拍摄的图像，都是比较亮的。也就是说，在做深度估计时，图片一定要清晰，上方演示测试的不完美结果可能就是图片不清晰导致的。另外，可视化特征图后，可以在一定程度上看出模型的各个部分是怎么“分工”的。可视化结果里部分的黑色区域说明了有效信息的缺失。有意思的是，对于白天私有特征和夜间私有特征，其浅层特征是比较清晰的，越深越模糊，并且他们对道路两边的物体比较敏感（比如停在道路两边的车辆或是从旁边经过的车辆）；而对于白天共有特征和夜间共有特征，可以发现它们的可行驶边缘的两条线比较亮，而道路两边的区域相对来说比较暗。这也反映了私有特征和共有特征确实是互补的。

以上就是基于域分离的全天图像自监督单目深度估计的论文初步解读，欢迎大家来我的AI Studio页面互关交流 aistudio.baidu.com/aistudio/pe…

此外，想探索更多关于自动驾驶相关单目双目深度算法、3D单目感知算法、3D点云感知算法和BEV感知算法的问题，可以前往：

Paddle3D github.com/PaddlePaddl…
PaddleDepth github.com/PaddlePaddl…

参考文献

[1] Xingshuai Dong, Matthew A. Garratt, Sreenatha G. Anavatti, & Hussein A. Abbass (2023). Towards Real-Time Monocular Depth Estimation for Robotics: A Survey** [2] Lina Liu, Xibin Song, Mengmeng Wang, Yong Liu, & Liangjun Zhang (2021). Self-supervised Monocular Depth Estimation for All Day Images using Domain Separation.. international conference on computer vision.

ADDS-DepthNet: Domain Separation-based Self-Supervised Monocular Depth Estimation from All-Sky Images

01 Introduction

02 Algorithm background

03 ADDS Algorithm Architecture

04 基于PaddleVideo快速体验ADDS-DepthNet

05 总结

Guess you like