Low cost, whole process! Application scheme of 3D vision technology based on PaddleDepth and Paddle3D

Many application scenarios in real life need to involve 3D information. In view of the complex and diverse application scenarios of 3D vision technology, numerous 3D perception tasks, and complex processes, Paddle provides developers with a low-cost depth information collection solution PaddleDepth and a full-process development kit Paddle3D for autonomous driving 3D perception.

fb9f7dc08a02d1b46a1ba8c2ce4e308b.png

3D Vision Technology Application Scenarios

3D vision is a very popular concept in recent years. It focuses on making computers mimic the human brain to understand and analyze the data collected by sensors. In the past, the 2D vision tasks we did were more about understanding and analyzing the color image information collected by the camera, but in real life, many scenes require 3D information, so 3D vision tasks emerged as the times require.

e353d0cc57c3509ca0c186116a376c73.png

As shown in the figure below, from the displayed 3D vision technology application scenarios, we can see that 3D vision technology has great application value in the fields of intelligent manufacturing, intelligent unmanned systems, and intelligent medical care.

0223a087fb349e166999a061a3022ad2.png

According to the application scenarios of 3D vision technology, we need to carry out targeted modeling of the problem. For example, for intelligent sports scenes, it is necessary to quantitatively analyze the athlete's posture through 3D posture estimation technology. In the autonomous driving scenario, it is necessary to detect the vehicles next to the unmanned vehicle in real time through 3D object detection and tracking technology.

874ec60d3c21b28d06d0f82a89c0b51f.png

To sum up, the main scenarios that 3D vision tasks have to face are complex and changeable. We need to select appropriate sensors for specific business scenarios and determine the specific modeling methods of tasks based on the collected data. From this, we discovered two core problems that need to be solved in the implementation of 3D vision technology.

6dde45329e702de09ffdbaacc77fe130.png

3D Vision Application Difficulties and Solutions

In terms of data acquisition, the existing 3D data acquisition equipment has problems such as high price, sparse or low resolution of the collected data. In response to these problems, we propose PaddleDepth, a paddle depth enhancement development kit. At present, commonly used depth information collection equipment is divided into lidar and ToF (Time of Flight) equipment. Among them, lidar is often used in outdoor scenes. The depth information it collects is relatively sparse and cannot be used for dense 3D reconstruction. Therefore, it is necessary to complement the depth information. ToF equipment is often used in indoor scenes, and the depth information it collects is generally stored in the form of images with low resolution, so it is necessary to perform super-resolution operations on depth information. In addition, the existing depth equipment is expensive, which greatly limits its application range in real scenarios. Therefore, we consider directly estimating the depth information of the scene from the color image, thereby greatly reducing the cost of obtaining depth information, that is, depth information estimation. We have open-sourced the depth information enhancement technology in PaddleDepth, a flying paddle depth enhancement development kit, which can provide a low-cost depth information collection solution.

In the field of 3D perception, the end-to-end process from training, evaluation to deployment is very complicated. Based on this, Fei Paddle proposes Paddle3D, which focuses on the field of 3D perception, covers a large number of 3D perception models, and provides training, evaluation to A full-process tutorial for deployment to reduce user development costs.

56638ad67b3df557d30b6505b6694471.png

5eba40db7fd19568f71a08b8caa0d882.png

PaddleDepth Enhancement Development Kit - PaddleDepth

As shown in the figure below, PaddleDepth aims to create a low-cost depth information collection solution to achieve full coverage of three types of depth information enhancement technologies: depth information completion, depth information super-resolution, and depth information estimation. At present, PaddleDepth contains a total of 10+ cutting-edge models and 4+ self-developed algorithms that are open source for the first time.

5c14453db22909fda5112f9ac4cff823.png

In terms of technical influence, PaddleDepth's self-developed algorithm for depth information completion, super-resolution, and single/binocular depth estimation has achieved SOTA performance in various public datasets.

80d0d4c4e69b0f9dc4da2356f31426ed.png

On the open source dataset KITTI, PaddleDepth ranks first in self-supervised monocular depth information estimation, supervised binocular depth information estimation tasks, and depth information completion tasks. On the Middlebury dataset, PaddleDepth ranked first in the depth super-resolution task, and won the championship in the ECCV2020 Robust Vision Challenge Stereo Matching task, and its depth information enhancement technology is industry-leading.

The following is the effect display:

In-depth information completion result display

Compared with obtaining a sparse depth map directly through lidar, users can obtain dense depth estimation results through depth information completion, and perform better 3D reconstruction.

Depth Completion Results

Point cloud reconstruction result after completion

Depth map super-resolution result display

Through depth image super-resolution, users can obtain denser 3D reconstruction results.

Left: super-resolution result Right: original point cloud result

Display of monocular depth estimation results

Through monocular depth estimation, users can reconstruct the 3D information of the original object from a single image.

Monocular Depth Estimation Results

Monocular depth estimation point cloud reconstruction results

 

Display of binocular depth estimation results

Through the principle of binocular ranging, users can better reconstruct the three-dimensional information of the original object.

Binocular Depth Estimation Results

Binocular Depth Estimation Point Cloud Reconstruction Results

 

As shown in the figure below, by comparing the 3D reconstruction results, the above methods can obtain more reasonable 3D reconstruction results. Among them, the results of depth completion and binocular depth estimation are more accurate through the input of lidar and the constraints of the game.

PaddleDepth-point cloud reconstruction results display

To sum up, in view of the limitations of existing 3D information collection equipment, we propose PaddleDepth to provide a low-cost depth information collection solution.

  • Through the super-resolution of the depth map , it is mainly used to solve the problem of low resolution of the collected depth image;

  • Through depth completion , it is mainly used to solve the problem of sparse depth images collected;

  • By directly performing depth estimation on input color images , users can further reduce the cost of 3D information collection.

7a865c074ddab32c92c889fa059a1d96.png

74c5c282cc090a52bf2074e485c8120d.png

Flying Paddle 3D Perception Development Kit—Paddle3D

As mentioned earlier, one of the difficulties in the 3D perception development task is that there are many tasks and complex processes. Based on this background, we designed and developed the Paddle3D 3D perception development kit.

The figure below is the overall architecture of Paddle3D. The bottom layer is the framework layer, which is developed based on the core framework of Paddle. On top of the Paddle framework, we provide some basic tools, including the integration of common datasets and operators in specific 3D domains. Further up is the algorithm layer, including different types of algorithms. The top layer is the tool layer, which integrates other tools of the flying paddle.

9ba9bb1ccedbc11c1e647a109509c68a.png

Paddle3D has four characteristics, including rich model library, flexible framework design, end-to-end full process coverage, and seamless connection with Apollo during deployment.

Rich model library

Paddle3D covers cutting-edge classic models in many different directions. For example, the classic models in the monocular 3D detection task based on a single camera, such as SMOKE, CaDDN, etc. The advantage of this type of method is that the cost of the camera is low and the cost is controllable. Paddle3D also integrates a lidar-based target detection model, that is, a point cloud detection model, such as PointPillars, IA-SSD, etc. The advantage of this type of method is that point cloud data has three-dimensional information, and point cloud-based three-dimensional target detection is better than monocular 3D The accuracy is higher. Paddle3D supports multi-modal models. The advantage of this type of method is the advantage of fusing different modal data, which has better robustness. In addition, Paddle3D also supports currently popular multi-view detection task models, such as BEVFormer, PETR, etc. Users can choose the appropriate model for verification according to their actual scenarios.

241fb07b8a7d01b673252324cba13963.png

In point cloud-based 3D detection tasks, the problem often encountered is that the amount of video memory and calculation is very large. In order to avoid these problems, many methods have adjusted the model structure, mapping features from three-dimensional space to two-dimensional space, to reduce the consumption of memory by the model, but another problem is that the accuracy of the model is reduced. To solve this problem, the solution provided by Paddle is sparse convolution SparseConv, which reduces invalid calculations through rule tables, and then solves the problem of video memory and calculation.

Paddle framework version 2.4 has provided relevant capabilities, and Paddle3D has also integrated many cutting-edge models using SparseConv, such as PV-RCNN, Voxel R-CNN, etc.

492bb1fe9219dd0e98dc8495a07902bb.png

You can see the model accuracy and speed indicators listed in the above figure, and the results are very good

Flexible frame design

The framework design of Paddle3D can meet the needs of different users. For users who need to integrate Paddle3D into specific tasks, rapid secondary development can be carried out based on the API provided by Paddle.

As shown in the figure below, taking model training as an example, Paddle quickly completes model networking, data set loading, optimizer definition, etc. through 6 APIs, and then starts the training function. For users who do not need secondary development, use the configuration files provided by Paddle to configure different components, and then use the command line tool to start training with one click.

1. Six APIs complete model training to meet secondary development or integration requirements

  • Specify the training data set

train_dataset = KittiMonoDataset(
    dataset_root='datasets/KITTI’,  mode='train‘,
    transforms=[
        T.LoadImage(reader='pillow', to_chw=False), T.Gt2SmokeTarget(mode='train', num_classes=3),
        T.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225])])
  • define model

model = SMOKE(
    backbone=DLA34(),
    head=SMOKEPredictor(num_classes=3),
    depth_ref=[28.01, 16.32],
    dim_ref=[[3.88, 1.63, 1.53], [1.78, 1.70, 0.58], [0.88, 1.73, 0.67]])
  • Learning Rate Update Strategy

lr_scheduler = paddle.optimizer.lr.MultiStepDecay(
    milestones=[36000, 55000],
    learning_rate=1.25e-4)
  • define optimizer

optimizer = paddle.optimizer.Adam(
    learning_rate=lr_scheduler,
    parameters=model.parameters())
  • designated trainer

trainer = Trainer(
    model=model,
    optimizer=optimizer,
    iters=20,
    train_dataset=train_dataset)
  • start training

trainer.train()

2. Configurable simple training cost, one line command to start training

batch_size: 8
iters: 70000

train_dataset:
   type: KittiMonoDataset
   dataset_root: datasets/KITTI
   transforms:
     - type: LoadImage
       reader: pillow
       to_chw: False
     - type: Normalize
       mean: [0.485, 0.456, 0.406]
       std: [0.229, 0.224, 0.225]

lr_scheduler:
   type: MultiStepDecay
   milestones: [36000, 55000]
   learning_rate: 1.25e-4

optimizer:
   type: Adam
python tools/train.py --config configs/smoke/smoke_dla34_no_dcn_kitti.yml --iters 20 --log_interval 1 --num_worker 5

c758bce26098d3e58963a81393893277.gif

End-to-end full process coverage

Starting from data preparation, Paddle provides an interface for point cloud data for the scripts generated by the database. During the training process, Paddle integrates VisualDL to view the indicators during the training process in real time. In the final model deployment section, a complete and detailed tutorial and deployment script are provided, as well as the ultimate optimization of model inference performance.

702dfb57380b565e3154e740652c984a.png

Seamless connection with Apollo

Based on the development process of the Paddle3D perception model After completing the Paddle3D training model, put the model into the Apollo project, replace the original perception model, call the relevant perception interface, and then start the automatic driving front-end software DreamView to view the prediction effect of the model.

Support rapid verification of model effects, high-performance fusion of multi-modal models, and realize efficient construction of full-stack technology solutions for autonomous driving.

010613c978de7e410568280ee3488f74.pngPerception model development process based on Paddle3D

c159402096f0169c0349d75e649dcfe1.gif

In summary, flying paddles can solve two difficulties in 3D perception tasks.

5aaddeb3dc446f37d2ab6ea126370ca0.png

  • 3D data collection . For example, data acquisition equipment is expensive, the resolution of equipment acquisition data is low, and the depth map collected by lidar is sparse. PaddleDepth provides developers with a low-cost depth information collection solution.

  • 3D information application . Difficulties in this direction include many ways to build task models and high cost of getting started. Paddle3D provides developers with a full-process development solution in the direction of 3D perception, covering a large number of 3D perception models, and provides a full-process tutorial from training, evaluation to deployment, and users can quickly verify the effect.

Guess you like

Origin blog.csdn.net/PaddlePaddle/article/details/130143376