DETR3D: 3D Object Detection from Multi-view Images via 3D-to-2D Queries

编程语言 2023-07-18 17:10:21 阅读次数: 0

DETR3D: 3D Object Detection from Multi-view Images via 3D-to-2D Queries

目的

本文提出了一个仅使用 2D 信息的，3D 目标检测网络，并且比依赖与密集的深度预测或者 3D 重建的过程。该网络使用了和 DETR 相似的 trasformer decoder ，因此也无需 NMS 等后处理操作。

长久以来 3D目标检测是一个挑战，并且仅使用 2D 的图像信息（RGB图像），相比于 3D 信息（LiDAR）更加困难。

一些经典的方法：

使用2D 目标检测 pipeline（CenterNet，FCOS等）预测 3D信息（目标pose，速度），并不考虑 3D场景结构或传感器配置。这些方法需要一些后处理来融合多个相机的信息，以及去掉冗余的 boxes。
作为这些基于2D方法的一些替代方案，一些方法将3D的计算纳入了 pipeline中：通过从图像中生成伪激光雷达，或是场景的距离。然后使用一些 3D 目标检测方法，处理这些数据，就好像我们直接获取了 3D 的数据。这种方法的问题是，对深度估计的不准确，会对3D的目标检测带来负面影响。

本文提出了一个更加优雅的 2D 观察到 3D预测的过渡，用于自动驾驶任务，该方法不依赖于密集的深度预测模块。

方法

网络结构

![[attachments/9d61c4fc84ee4502b9076578e658b578_2_Figure_1.png]]

网络结构概述：

使用一个共享的 ResNet backbone 以及 FPN 提取特征
一个检测头，以 geometry-aware manner 连接 2D 特征和 3D bbox 预测。检测头的每一层都输入从数据中学到的目标 query 的稀疏集合。每一个 object query 都编码了 3D 位置信息，这些 object query 都被投影到了相机平面，并被用来收集图像的特征。
与 DETR 相同，使用了多头注意力 refine object queries，这个 layer 将重复多次
在 decoder 的最后会使用一个 FFN 给出最后的结果
最后使用 set-set loss 训练网络

decoder 每一个 layer 的处理步骤：

预测一组与对象查询相关的边界框中心；
使用相机变换矩阵将这些中心投影到所有特征图中；
通过双线性插值对特征进行采样并将它们合并到对象查询中；
使用多头注意力描述对象交互。

loss

类似于 DETR 的 set to set 的 loss，在 decoder 的每一个 layer 后面都有 loss 的计算。

相关资料

BEV下的纯视觉目标检测-DETR3D - 清华MARS Lab的文章 - 知乎 https://zhuanlan.zhihu.com/p/499795161

猜你喜欢

转载自blog.csdn.net/SugerOO/article/details/131737609

【DETR用于3D目标检测】DETR3D: 3D Object Detection from Multi-view Images via 3D-to-2D Queries

论文精读：《DETR3D: 3D Object Detection from Multi-view Images via 3D-to-2D Queries》

论文翻译：《DETR3D: 3D Object Detection from Multi-view Images via 3D-to-2D Queries》

DETR3D: 3D Object Detection from Multi-view Images via 3D-to-2D Queries

Multi-View 3D Object Detection Network for Autonomous Driving(MV3D模型)

14.CAPE：Camera View Position Embedding for Multi-View 3D Object Detection笔记

StreamPETR：Exploring Object-Centric Temporal Modeling for Efficient Multi-View 3D Object Detection

TiG-BEV：Multi-view BEV 3D Object Detection via Target Inner-Geometry Learning——论文笔记

[cvpr17]Multi-View 3D Object Detection Network for Autonomous Driving

Multi-View 3D Object Detection Network for Autonomous Driving

论文阅读：《PETR: Position Embedding Transformation for Multi-View 3D Object Detection》

PETR: Position Embedding Transformation for Multi-View 3D Object Detection

图像检测 - PETR: Position Embedding Transformation for Multi-View 3D Object Detection (ECCV 2022)

BEVSimDet：Simulated Multi-modal Distillation in Bird’s-Eye View for Multi-view 3D Object Detection

LATR：3D Lane Detection from Monocular Images with Transformer

DG-BEV：Towards Domain Generalization for Multi-view 3D Object Detection in Bird-Eye-View

Sliding Shapes for 3D Object Detection in Depth Images

BEVDistill：Cross-Modal BEV Distillation for Multi-View 3D Object Detection——论文笔记

Time Will Tell：New Outlooks and A Baseline for Temporal Multi-View 3D Object Detection——论文笔记

【论文笔记】NeRF-Det: Learning Geometry-Aware Volumetric Representation for Multi-View 3D Object Detection

用于多视图 3D 对象检测的位置嵌入变换(PETR: Position Embedding Transformation for Multi-View 3D Object Detection）

综述：3D目标检测于RGB-D（Object detection in RGB-D images）

SparseBEV：High-Performance Sparse 3D Object Detection from Multi-Camera Videos

【论文解读】Multi-View 3D Shape Recognition via Correspondence-Aware Deep Learning

VISTA Boosting 3D Object Detection via Dual Cross-VIew SpaTial Attention论文个人总结

【论文】Triplet-Center Loss for Multi-View 3D Object

Multi-view Harmonized Bilinear Network for 3D Object Recognition

[论文解读]Learning Relationships for Multi-View 3D Object Recognition.

《Frustum PointNets for 3D Object Detection from RGB-D Data》论文及代码学习

Paper reading：Frustum PointNets for 3D Object Detection from RGB-D Data

今日推荐

Arc Browser for Windows 1.0 正式 GA

90后程序员开发视频搬运软件、不到一年获利超 700 万，结局很刑！

《美国对全球网络空间安全与发展的威胁和破坏》报告发布

火速冲上 GitHub 热榜 —— 开源编程语言、框架哪有这么可爱？

北京人形机器人创新中心发布全球首个纯电驱拟人奔跑的全尺寸人形机器人“天工”

周排行

rbac——界面、权限

Apache CXF + SpringMVC 整合发布WebService

so插件化

Vue.js实战系列---图标字体制作（svg格式）

PAT乙级 1007 素数对猜想(孪生素数对) (20分) ---（C语言 + 详细注释）

被IRM保护的文档，打开失败

Calendar和Date计算日期差的小问题

win10子系统ubuntu18.4安装docker

利用Wrap Shell Script定位Android Native内存泄漏

MySQL: Transaction (Part I - Basic Concept)

每日归档

更多

2024-05-03(19)

2024-05-02(0)

2024-05-01(4)

2024-04-30(1)

2024-04-29(40)

2024-04-28(0)

2024-04-27(56)

2024-04-26(39)

2024-04-25(22)

2024-04-24(36)