自动驾驶学习手册

Table of Contents目录

Papers

Overall

Self-Driving Cars: A Survey [Paper]
- Claudine Badue, Rânik Guidolini, Raphael Vivacqua Carneiro, Pedro Azevedo, Vinicius Brito Cardoso, Avelino Forechi, Luan Ferreira Reis Jesus, Rodrigo Ferreira Berriel, Thiago Meireles Paixão, Filipe Mutz, Thiago Oliveira-Santos, Alberto Ferreira De Souza
MIT Autonomous Vehicle Technology Study: Large-Scale Deep Learning Based Analysis of Driver Behavior and Interaction with Automation [Paper]
- Lex Fridman, Daniel E. Brown, Michael Glazer, William Angell, Spencer Dodd, Benedikt Jenik, Jack Terwilliger, Julia Kindelsberger, Li Ding, Sean Seaman, Hillary Abraham, Alea Mehler, Andrew Sipperley, Anthony Pettinato, Bobbie Seppelt, Linda Angell, Bruce Mehler, Bryan Reimer

Classification分类

Densely Connected Convolutional Networks [Paper]
- Gao Huang, Zhuang Liu, Laurens van der Maaten, Kilian Q. Weinberger, arXiv:1608.06993.
Microsoft (Deep Residual Learning) [Paper][Slide]
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, Deep Residual Learning for Image Recognition, arXiv:1512.03385.
Microsoft (PReLu/Weight Initialization) [Paper]
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification, arXiv:1502.01852.
Batch Normalization [Paper]
- Sergey Ioffe, Christian Szegedy, Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift, arXiv:1502.03167.
Differentiable Learning-to-Normalize via Switchable Normalization [Paper] [Code]
- Ping Luo, Jiamin Ren, Zhanglin Peng, arXiv:1806.10779.
GoogLeNet [Paper]
- Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, Andrew Rabinovich, CVPR, 2015.
VGG-Net [Web] [Paper]
- Karen Simonyan and Andrew Zisserman, Very Deep Convolutional Networks for Large-Scale Visual Recognition, ICLR, 2015.
AlexNet [Paper]
- Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton, ImageNet Classification with Deep Convolutional Neural Networks, NIPS, 2012.

2D Object Detection 2D目标检测

PVANET [Paper] [Code]
- Kye-Hyeon Kim, Sanghoon Hong, Byungseok Roh, Yeongjae Cheon, Minje Park, PVANET: Deep but Lightweight Neural Networks for Real-time Object Detection, arXiv:1608.08021
OverFeat, NYU [Paper]
- OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks, ICLR, 2014.
R-CNN, UC Berkeley [Paper-CVPR14] [Paper-arXiv14]
- Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik, Rich feature hierarchies for accurate object detection and semantic segmentation, CVPR, 2014.
SPP, Microsoft Research [Paper]
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition, ECCV, 2014.
Fast R-CNN, Microsoft Research [Paper]
- Ross Girshick, Fast R-CNN, arXiv:1504.08083.
Faster R-CNN, Microsoft Research [Paper]
- Shaoqing Ren, Kaiming He, Ross Girshick, Jian Sun, Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks, arXiv:1506.01497.
R-CNN minus R, Oxford [Paper]
- Karel Lenc, Andrea Vedaldi, R-CNN minus R, arXiv:1506.06981.
End-to-end people detection in crowded scenes [Paper]
- Russell Stewart, Mykhaylo Andriluka, End-to-end people detection in crowded scenes, arXiv:1506.04878.
You Only Look Once: Unified, Real-Time Object Detection [Paper], [Paper Version 2], [C Code], [Tensorflow Code]
- Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi, You Only Look Once: Unified, Real-Time Object Detection, arXiv:1506.02640
- Joseph Redmon, Ali Farhadi (Version 2)
Inside-Outside Net [Paper]
- Sean Bell, C. Lawrence Zitnick, Kavita Bala, Ross Girshick, Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks
Deep Residual Network (Current State-of-the-Art) [Paper]
- Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun, Deep Residual Learning for Image Recognition
Weakly Supervised Object Localization with Multi-fold Multiple Instance Learning [Paper]
R-FCN [Paper] [Code]
- Jifeng Dai, Yi Li, Kaiming He, Jian Sun, R-FCN: Object Detection via Region-based Fully Convolutional Networks
SSD [Paper] [Code]
- Wei Liu1, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng-Yang Fu, Alexander C. Berg, SSD: Single Shot MultiBox Detector, arXiv:1512.02325
Speed/accuracy trade-offs for modern convolutional object detectors [Paper]
- Jonathan Huang, Vivek Rathod, Chen Sun, Menglong Zhu, Anoop Korattikara, Alireza Fathi, Ian Fischer, Zbigniew Wojna, Yang Song, Sergio Guadarrama, Kevin Murphy, Google Research, arXiv:1611.10012
SA General Pipeline for 3D Detection of Vehicles [Paper]
- Xinxin Du1, Marcelo H. Ang Jr.2, Sertac Karaman3 and Daniela Rus3, arXiv:1803.00387
Multi-Task Vehicle Detection With Region-of-Interest Voting [Paper]
- Wenqing Chu , Yao Liu, Chen Shen, Deng Cai, Member, IEEE, and Xian-Sheng Hua, Fellow, IEEE
Car Detection for Autonomous Vehicle: LIDAR and Vision Fusion Approach Through Deep Learning Framework [Paper]
- Xinxin Du1,Marcelo H. Ang Jr. and Daniela Rus
A Unified Multi-scale Deep Convolutional Neural Network for Fast Object Detection [Paper]
- Zhaowei Cai, Quanfu Fan, Rogerio S. Feris, and Nuno Vasconcelos, UC San Diego, IBM Watson REsearch, arXiv:1607.07155
Complex-YOLO: An Euler-Region-Proposal for Real-time 3D Object Detection on Point Clouds [Paper]
- Martin Simony, Stefan Milzy, Karl Amendey, Horst-Michael Gross, Valeo Schalter und Sensoren GmbHy, Ilmenau University of Technology, arXiv:1803.06199

3D Object Detection 3D目标检测

PIXOR: Real-time 3D Object Detection from Point Clouds [Paper]
- Bin-Yang, Wenjie Luo, Rquel Urtasun, Uber Advanced Technologies Group, University of Toronto
Fast and Furious: Real Time End-to-End 3D Detection, Tracking and Motion Forecasting with a Single Convolutional Net [Paper]
- Wenjie Luo, Bin Yang and Raquel UrtasunUber Advanced Technologies GroupUniversity of Toronto
Joint 3D Proposal Generation and Object Detection from ViewAggregation [Paper] [Code]
- Jason Ku, Melissa Mozifian, Jungwook Lee, Ali Harakeh, and Steven L. Waslander
Frustum PointNets for 3D Object Detection from RGB-D Data [Paper] [Code]
- Jason Ku, Melissa Mozifian, Jungwook Lee, Ali Harakeh, and Steven L. Waslander
PointNet: Deep Learning on Point Sets for 3D Classification and Segmentation [Paper] [Code]
- Charles R. Qi, Hao Su, Kaichun Mo, Leonidas J. Guibas, Stanford University
PointNet++: Deep Hierarchical Feature Learning onPoint Sets in a Metric Space [Paper] [Code]
- Charles R. Qi, Li Yi, Hao Su, Leonidas J. Guibas, Stanford University
Deep MANTA: A Coarse-to-fine Many-Task Network for joint 2D and 3D vehicleanalysis from monocular image [Paper]
- Florian Chabot1, Mohamed Chaouch, Jaonary Rabarisoa, Celine Teuliere, Thierry Chateau
Monocular 3D Object Detection for Autonomous Driving [Paper]
- Xiaozhi Chen, Kaustav Kundu, Ziyu Zhang, Huimin Ma, Sanja Fidler, Raquel Urtasun,Tsinghua University, University of Toronto
Subcategory-Aware Convolutional Neural Networks for Object Proposals and Detection [Paper]
- Yu Xiang, Wongun Choi, Yuanqing Lin, and Silvio Savarese, University of Washington, NEC Lab, Baidu, Stanford University

Object Tracking 目标追踪

Beyond Pixels Leveraging Geometry and Shape Cues for Online Multi-Object Tracking [Paper]
- Sarthak Sharma, Junaid Ahmed Ansari, J. Krishna Murthy1and K. Madhava Krishna
Tracking the Untrackable Learning to Track Multiple Cues with Long-Term Dependencies [Paper]
- Amir Sadeghian, Alexandre Alahi, Silvio Savarese, Stanford University2Ecole Polytechnique Federale de Lausanne (EPFL), Switzerland
Near-Online Multi-target Tracking with Aggregated Local Flow Descriptor, [Paper]
- WNear-Online Multi-target Tracking with Aggregated Local Flow Descripto, NEC Lab
Fast and Furious: Real Time End-to-End 3D Detection, Tracking and Motion Forecasting with a Single Convolutional Net [Paper]
- Wenjie Luo, Bin Yang and Raquel Urtasun, Uber Advanced Technologies Group University of Toronto

Semantic Segmentation 语义分割

PASCAL VOC2012 Challenge Leaderboard (01 Sep. 2016) (from PASCAL VOC2012 leaderboards)
Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs**
- Liang-Chieh Chen+, George Papandreou+, Iasonas Kokkinos, Kevin Murphy, Alan L. Yuille (+ equal contribution).
  [Paper]. In ICLR, 2015.
DeepLab: Semantic Image Segmentation with Deep Convolutional Nets,** Atrous Convolution, and Fully Connected CRFs*
Liang-Chieh Chen+, George Papandreou+, Iasonas Kokkinos, Kevin Murphy, and Alan L Yuille (+ equal contribution).
[Paper]. TPAMI 2017.
Rethinking Atrous Convolution for Semantic Image Segmentation**
- Liang-Chieh Chen, George Papandreou, Florian Schroff, Hartwig Adam.
  [Paper]. arXiv: 1706.05587, 2017.
Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation**
- Liang-Chieh Chen, Yukun Zhu, George Papandreou, Florian Schroff, Hartwig Adam. arXiv: 1802.02611.
  [Paper]. arXiv: 1802.02611, 2018.
ParseNet: Looking Wider to See Better
- Wei Liu, Andrew Rabinovich, Alexander C Berg [Paper]. arXiv:1506.04579, 2015.
Pyramid Scene Parsing Network
- Hengshuang Zhao, Jianping Shi, Xiaojuan Qi, Xiaogang Wang, Jiaya Jia [Paper]. In CVPR, 2017.
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate shift
- Sergey Ioffe, Christian Szegedy [Paper]. In ICML, 2015.
Inverted Residuals and Linear Bottlenecks: Mobile Networks for Classification, Detection and Segmentation
- Mark Sandler, Andrew Howard, Menglong Zhu, Andrey Zhmoginov, Liang-Chieh Chen [Paper]. arXiv:1801.04381, 2018.
Xception: Deep Learning with Depthwise Separable Convolutions
- François Chollet [Paper]. In CVPR, 2017.
Deformable Convolutional Networks -- COCO Detection and Segmentation Challenge 2017 Entry
- Haozhi Qi, Zheng Zhang, Bin Xiao, Han Hu, Bowen Cheng, Yichen Wei, Jifeng Dai [Paper]. ICCV COCO Challenge Workshop, 2017.
Tensorflow: Large-Scale Machine Learning on Heterogeneous Distributed Systems M. Abadi, A. Agarwal, et al. [Paper]. arXiv:1603.04467, 2016.
The Pascal Visual Object Classes Challenge – A Retrospective,
- Mark Everingham, S. M. Ali Eslami, Luc Van Gool, Christopher K. I. Williams, John Winn, and Andrew Zisserma. [Paper]. IJCV, 2014.
The Cityscapes Dataset for Semantic Urban Scene Understanding
- Cordts, Marius, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, Bernt Schiele. [Paper]. In CVPR, 2016.
SEC: Seed, Expand and Constrain
- Alexander Kolesnikov, Christoph Lampert, Seed, Expand and Constrain: Three Principles for Weakly-Supervised Image Segmentation, ECCV, 2016. [Paper] [Code]
Adelaide
- Guosheng Lin, Chunhua Shen, Ian Reid, Anton van dan Hengel, Efficient piecewise training of deep structured models for semantic segmentation, arXiv:1504.01013. [Paper] (1st ranked in VOC2012)
- Guosheng Lin, Chunhua Shen, Ian Reid, Anton van den Hengel, Deeply Learning the Messages in Message Passing Inference, arXiv:1508.02108. [Paper] (4th ranked in VOC2012)
Deep Parsing Network (DPN)
- Ziwei Liu, Xiaoxiao Li, Ping Luo, Chen Change Loy, Xiaoou Tang, Semantic Image Segmentation via Deep Parsing Network, arXiv:1509.02634 / ICCV 2015 [Paper] (2nd ranked in VOC 2012)
CentraleSuperBoundaries, INRIA [Paper]
- Iasonas Kokkinos, Surpassing Humans in Boundary Detection using Deep Learning, arXiv:1411.07386 (4th ranked in VOC 2012)
BoxSup [Paper]
- Jifeng Dai, Kaiming He, Jian Sun, BoxSup: Exploiting Bounding Boxes to Supervise Convolutional Networks for Semantic Segmentation, arXiv:1503.01640. (6th ranked in VOC2012)
POSTECH
- Hyeonwoo Noh, Seunghoon Hong, Bohyung Han, Learning Deconvolution Network for Semantic Segmentation, arXiv:1505.04366. [Paper] (7th ranked in VOC2012)
- Seunghoon Hong, Hyeonwoo Noh, Bohyung Han, Decoupled Deep Neural Network for Semi-supervised Semantic Segmentation, arXiv:1506.04924. [Paper]
- Seunghoon Hong,Junhyuk Oh, Bohyung Han, and Honglak Lee, Learning Transferrable Knowledge for Semantic Segmentation with Deep Convolutional Neural Network, arXiv:1512.07928 [Paper] [Project Page]
Conditional Random Fields as Recurrent Neural Networks [Paper]
- Shuai Zheng, Sadeep Jayasumana, Bernardino Romera-Paredes, Vibhav Vineet, Zhizhong Su, Dalong Du, Chang Huang, Philip H. S. Torr, Conditional Random Fields as Recurrent Neural Networks, arXiv:1502.03240. (8th ranked in VOC2012)
DeepLab
- Liang-Chieh Chen, George Papandreou, Kevin Murphy, Alan L. Yuille, Weakly-and semi-supervised learning of a DCNN for semantic image segmentation, arXiv:1502.02734. [Paper] (9th ranked in VOC2012)
Zoom-out [Paper]
- Mohammadreza Mostajabi, Payman Yadollahpour, Gregory Shakhnarovich, Feedforward Semantic Segmentation With Zoom-Out Features, CVPR, 2015
Joint Calibration [Paper]
- Holger Caesar, Jasper Uijlings, Vittorio Ferrari, Joint Calibration for Semantic Segmentation, arXiv:1507.01581.
Fully Convolutional Networks for Semantic Segmentation [Paper-CVPR15] [Paper-arXiv15]
- Jonathan Long, Evan Shelhamer, Trevor Darrell, Fully Convolutional Networks for Semantic Segmentation, CVPR, 2015.
Hypercolumn [Paper]
- Bharath Hariharan, Pablo Arbelaez, Ross Girshick, Jitendra Malik, Hypercolumns for Object Segmentation and Fine-Grained Localization, CVPR, 2015.
Deep Hierarchical Parsing
- Abhishek Sharma, Oncel Tuzel, David W. Jacobs, Deep Hierarchical Parsing for Semantic Segmentation, CVPR, 2015. [Paper]
Learning Hierarchical Features for Scene Labeling [Paper-ICML12] [Paper-PAMI13]
- Clement Farabet, Camille Couprie, Laurent Najman, Yann LeCun, Scene Parsing with Multiscale Feature Learning, Purity Trees, and Optimal Covers, ICML, 2012.
- Clement Farabet, Camille Couprie, Laurent Najman, Yann LeCun, Learning Hierarchical Features for Scene Labeling, PAMI, 2013.
University of Cambridge [Web]
- Vijay Badrinarayanan, Alex Kendall and Roberto Cipolla "SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation." arXiv preprint arXiv:1511.00561, 2015. [Paper]
Alex Kendall, Vijay Badrinarayanan and Roberto Cipolla "Bayesian SegNet: Model Uncertainty in Deep Convolutional Encoder-Decoder Architectures for Scene Understanding." arXiv preprint arXiv:1511.02680, 2015. [Paper]
Princeton
- Fisher Yu, Vladlen Koltun, "Multi-Scale Context Aggregation by Dilated Convolutions", ICLR 2016, [Paper]
Univ. of Washington, Allen AI
- Hamid Izadinia, Fereshteh Sadeghi, Santosh Kumar Divvala, Yejin Choi, Ali Farhadi, "Segment-Phrase Table for Semantic Segmentation, Visual Entailment and Paraphrasing", ICCV, 2015, [Paper]
INRIA
- Iasonas Kokkinos, "Pusing the Boundaries of Boundary Detection Using deep Learning", ICLR 2016, [Paper]
UCSB
- Niloufar Pourian, S. Karthikeyan, and B.S. Manjunath, "Weakly supervised graph based semantic segmentation by learning communities of image-parts", ICCV, 2015, [Paper]

Depth Estimation 深度评估

Unsupervised Monocular Depth Estimation with Left-Right Consistency [Paper] [Code]
- Clement Godard, Oisin Mac Aodha, Gabriel J. Brostow, University College London
Deep Ordinal Regression Network for Monocular Depth Estimation [Paper] [Code]
- Niloufar Pourian, S. Karthikeyan, and B.S. Manjunath, "Weakly supervised graph based semantic segmentation by learning communities of image-parts", ICCV, 2015,
GeoNet: Unsupervised Learning of Dense Depth, Optical Flow and Camera Pose [Paper] [Code]
- Huan Fu, Mingming Gong, Chaohui Wang, Kayhan Batnanghelich, Dcheng Tao

Localization and Mapping 定位与导航

Visual SLAM algorithms: a survey from 2010 to 2016[Paper]
- Takafumi Taketomi, Hideaki Uchiyama and Sei Ikeda
Visual map matching and localization using a global feature map [Paper]
- Oliver Pink
Map-based precision vehicle localization in urban environments [Paper]
- Wolfram Burgard,Oliver Brock, Cyrill Stachniss
Simultaneous Localization And Mapping: A Survey of Current Trends in Autonomous Driving [Paper]
- Guillaume Bresson, Zayed Alsayed, Li Yu, Sébastien Glaser
Past, Present, and Future of Simultaneous Localization and Mapping: Toward the Robust-Perception Age [Paper]
- Cesar Cadena, Luca Carlone, Henry Carrillo, Yasir Latif, Davide Scaramuzza, Jose Neira, Ian Reid, and John J. Leonard
The GraphSLAM Algorithm with Applications to Large-Scale Mapping of Urban Structures [Paper]
- Sebastian Thrun, Michael Montemerlo
Large-scale mapping in complex field scenarios using an autonomous car [Paper]
- Filipe Mutza, Lucas P.Veronese, Thiago Oliveira-Santos, Edilsonde Aguiar, Fernando A.Auat Cheein, Alberto Ferreira De Souza
Road-SLAM : Road Marking based SLAM with Lane-level Accuracy [Paper]
- Jinyong Jeong, Younggun Cho, and Ayoung Kim
LIMO: Lidar-Monocular Visual Odometry [Paper]
- Johannes Graeter, Alexander Wilczynski, Martin Lauer
Visual-lidar Odometry and Mapping: Low-drift, Robust, and Fast [Paper]
- Ji Zhang and Sanjiv Singh
LOAM: Lidar Odometry and Mapping in Real-time [Paper]
- Ji Zhang and Sanjiv Singh
Efficient Surfel-Based SLAM using 3D Laser Range Data in Urban Environments [Paper]
- Jens Behley and Cyrill Stachni
SOFT-SLAM: Computationally Efficient Stereo Visual SLAM for Autonomous UAVs [Paper]
- Igor Cvišić, Josip Ćesić, Ivan Marković, Ivan Petrović
Visual SLAM for Automated Driving: Exploring the Applications of Deep Learning [Paper ]
- Stefan Milz, Georg Arbeiter, Christian Witt, Bassam Abdallah
SegMap: 3D Segment Mapping using Data-Driven Descriptors [Paper]
- Renaud Dube, Andrei Cramariuc, Daniel Dugas, Juan Nieto, Roland Siegwart, and Cesar Cadena
SegMatch: Segment Based Place Recognition in 3D Point Clouds [Paper]
- Renaud Dubé, Daniel Dugas, Elena Stumm, Juan Nieto, Roland Siegwart, Cesar Cadena∗
Incremental Segment-Based Localization in 3D Point Clouds [Paper]
- Renaud Dubé, Mattia G. Gollub, Hannes Sommer, Igor Gilitschensk
Direct Visual SLAM using Sparse Depth for Camera-LiDAR System [Paper]
- Young-Sik Shin, Yeong Sang Park and Ayoung Kim

Visual Odometry 视觉测距

Review of visual odometry:types, approaches, challenges, and applications [Paper]
- Mohammad O. A. Aqel, Mohammad H. Marhaban, M. Iqbal Saripan and Napsiah Bt. Ismail
Semantic segmentation-aided visual odometry for urban autonomous driving [Paper]
- Lifeng An, Xinyu Zhang, Hongbo Gao and Yuchao Liu
Vision-based ACC with a Single Camera : Bounds on Range and Range Rate Accuracy [Paper]
- Gideon P.Stein, Ofer Mano, Amnon Shashua, Mobileye

Lane Detection 车道线检测

Towards End-to-End Lane Detection: an Instance Segmentation Approach[Paper]
- Davy Neven, Bert De Brabandere, Stamatios Georgoulis, Marc Proesmans, Luc Van Gool
Vision-Based Lane Analysis: Exploration of Issues and Approaches for Embedded Realization [Paper]
- Scott Drew Pendleton, Hans Andersen, Xinxin Du, Xiaotong Shen, Malika Meghjani, You Hong Eng, Daniela Rus and Marcelo H. Ang
Drive Analysis Using Vehicle Dynamicsand Vision-Based Lane Semantics [Paper]
- Ravi Kumar=, Satzoda and Mohan Manubhai Trivedi

Decision Making 决策算法

Planning and Decision-Making for Autonomous Vehicles[Paper]
- Wilko Schwarting, Javier Alonso-Mora and Daniela Rus
Perception, planning, control, and coordination for autonomous vehicles [Paper]
- R. K. Satzoda and Mohan M. Trivedi
A survey of motion planning and control techniques for self-driving urban vehicles [Paper]
- Brian Paden, Michal Cap, Sze Zheng Yong, Dmitry Yershov, Emilio Frazzoli
Real-time motion planning methods for autonomous on-road driving: State-of-the-art and future research directions [Paper]
- Christos Katrakazas, Mohammed Quddus, Wen-Hua Chen, Lipika Dekaa
Behavior and path planning algorithm of autonomous vehicle A1 in structured environments [Paper]
- Junsoo Kim, Kichun Jo, Dongchul Kim, Keonyup Chu, Myoungho Sunwoo
How Does Path Planning for Autonomous Vehicles Work [Paper]
Towards full automated drive in urban environments: A demonstration in gomentum station, california [Paper]
- Akansel Cosgun, Lichao Ma, Jimmy Chiu, Jiawei Huang, Mahmut Demir, Alexandre Miranda Anon, Thang Lian, Hasan Tafish, Samir Al-Stouhi
A behavioral planning framework for autonomous driving [Paper]
- Junqing Wei, Jarrod M. Snider, Tianyu Gu, John M. Dolan, Bakhtiar Litkouhi
Towards a functional system architecture for automated vehicles [Paper]
- Simon Ulbrich, Andreas Reschka, Jens Rieken, Susanne Ernst, Gerrit Bagschik, Frank Dierkes, Marcus Nolte, Markus Maurer
Autonomous Driving: Planning, Control & Other Topics. (UNC presentation slides) [Slide]
- Simon Ulbrich, Andreas Reschka, Jens Rieken, Susanne Ernst, Gerrit Bagschik, Frank Dierkes, Marcus Nolte, Markus Maurer
Udacity Self-Driving Car Nano Degree program description [Web]

Planning 路径规划

Optimal trajectory generation for dynamic street scenarios in a frenet frame[Paper]
- Moritz Werling, Julius Ziegler, Soren Kammel, Sebastian Thrun
Path planning for autonomous vehicles in unknown semi-structured environments[Paper]
- Dolgov, Dmitry et. al
Local path planning for off-road autonomous driving with avoidance of static obstacles[Paper]
- Chu, Keonyup, Minchae Lee, and Myoungho Sunwoo
Trajectory planning for Bertha—A local, continuous methods[Paper]
- Ziegler, Julius, et al.
Efficient sampling-based motion planning for on-road autonomous driving[Paper]
- Ma, Liang, et al.
Real-time motion planning methods for autonomous on-road driving: State-of-the-art and future research directions[Paper]
- Katrakazas, Christos, et al.
A Review of Motion Planning Techniques for Automated Vehicles[Paper]
- González, David, et al.
A survey of motion planning and control techniques for self-driving urban vehicles[Paper]
- Paden, Brian, et al.
Real-time trajectory planning for autonomous urban driving: Framework, algorithms, and verifications[Paper]
- Li, Xiaohui, et al.
Dynamic path planning for autonomous driving on various roads with avoidance of static and moving obstacles[Paper]
- Hu, Xuemin, et al.
Hybrid Trajectory Planning for Autonomous Driving in Highly Constrained Environments[Paper]
- Zhang, Yu, et al.
Vehicle path planning in various driving situations based on the elastic band theory for highway collision avoidance[Paper]
- Song, Xiaolin, Haotian Cao, and Jiang Huang

Control 控制

RL in Autonomous Driving 强化学习在自动驾驶中的应用

Dataset 数据集

KITTI Benchmark - Tasks of interest are: stereo, optical flow, visual odometry, 3D object detection and 3D tracking. For this purpose, we equipped a standard station wagon with two high-resolution color and grayscale video cameras. Accurate ground truth is provided by a Velodyne laser scanner and a GPS localization system.
Cityscape Dataset - Large-scale dataset that contains a diverse set of stereo video sequences recorded in street scenes from 50 different cities, with high quality pixel-level annotations of 5 000 frames in addition to a larger set of 20 000 weakly annotated frames. Focused on developing Pixel Level Classification, Instance-wise Segmentation.
Mapillary Vistas Dataset - A diverse street-level imagery dataset with pixel‑accurate and instance‑specific human annotations for understanding street scenes around the world. 25,000 high-resolution images,152 object categories,100 instance-specifically annotated categories,Global reach, covering 6 continents, Variety of weather, season, time of day, camera, and viewpoint
Appllo Scape - Scene Parsing ,Car Instance,Lane Segmentation,Self Localization,Trajectory
SYNTHetic collection of Imagery and Annotations (SYNTHIA) - SYNTHIA, The SYNTHetic collection of Imagery and Annotations, is a dataset that has been generated with the purpose of aiding semantic segmentation and related scene understanding problems in the context of driving scenarios. SYNTHIA consists of a collection of photo-realistic frames rendered from a virtual city and comes with precise pixel-level semantic annotations.
Nuscenenes -The nuScenes dataset (pronounced /nuːsiːnz/) is a public large-scale dataset for autonomous driving provided by nuTonomy-Aptiv. By releasing a subset of our data to the public, we aim to support public research into computer vision and autonomous driving. For this purpose we collected 1000 driving scenes in Boston and Singapore, two cities that are known for their dense traffic and highly challenging driving situations. The scenes of 20 second length are manually selected to show a diverse and interesting set of driving maneuvers, traffic situations and unexpected behaviors. The rich complexity of nuScenes will encourage development of methods that enable safe driving in urban areas with dozens of objects per scene. Gathering data on different continents further allows us to study the generalization of computer vision algorithms across different locations, weather conditions, vehicle types, vegetation, road markings and left versus right hand traffic.
Daimler Urban Segmetation Dataset - The Daimler Urban Segmentation Dataset consists of video sequences recorded in urban traffic. The dataset consists of 5000 rectified stereo image pairs with a resolution of 1024x440. 500 frames (every 10th frame of the sequence) come with pixel-level semantic class annotations into 5 classes: ground, building, vehicle, pedestrian, sky. Dense disparity maps are provided as a reference, however these are not manually annotated but computed using semi-global matching (sgm).
GTSRB, GTSDB - Dataset for Traffic Sign Classification, Traffic Sign Detection.
LaRA- Traffic Lights Recognition (TLR) public benchmarks
CALTECH Pedestrian Detection Benchmark - The Caltech Pedestrian Dataset consists of approximately 10 hours of 640x480 30Hz video taken from a vehicle driving through regular traffic in an urban environment. About 250,000 frames (in 137 approximately minute long segments) with a total of 350,000 bounding boxes and 2300 unique pedestrians were annotated. The annotation includes temporal correspondence between bounding boxes and detailed occlusion labels.
Caltech Lanes Dataset - Caltech Lanes dataset includes four clips taken around streets in Pasadena, CA at different times of day. The archive below inlucdes 1225 individual frames as taken from a camera mounted on Alice in addition to the labeled lanes. The dataset is divided into four individual clips: cordova1 with 250 frames, cordova2 with 406 frames, washington1 with 337 frames, and washington2 with 232 frames.
Udacity - ROSBAG training data. (~80 GB).
KAIST, Complex Urban Dataset - This data set provides Light Detection and Ranging (LiDAR) data with various position sensors targeting a highly complex urban environment. The presented data set captures features in urban environments (e.g. metropolis areas, complex buildings and residential areas). The data of 2D and 3D LiDAR are provided, which are typical types of LiDAR sensors. Raw sensor data for vehicle navigation is presented in a file format. For convenience, development tools are provided in the Robot Operating System (ROS) environment.
Oxford's Robotic Car - The Oxford RobotCar Dataset contains over 100 repetitions of a consistent route through Oxford, UK, captured over a period of over a year. The dataset captures many different combinations of weather, traffic and pedestrians, along with longer term changes such as construction and roadworks.
Velodyne SLAM Dataset from Karlsruhe Institute of Technology - Here, you can find two challenging datasets recorded with the Velodyne HDL64E-S2 scanner in the city of Karlsruhe, Germany.
University of Michigan North Campus Long-Term Vision and LIDAR Dataset - long-term autonomy dataset for robotics research collected on the University of Michigan’s North Campus. The dataset consists of omnidirectional imagery, 3D lidar, planar lidar, GPS, and proprioceptive sensors for odometry collected using a Segway robot. The dataset was collected to facilitate research focusing on longterm autonomous operation in changing environments. The dataset is comprised of 27 sessions spaced approximately biweekly over the course of 15 months. The sessions repeatedly explore the campus, both indoors and outdoors, on varying trajectories, and at different times of the day across all four seasons. This allows the dataset to capture many challenging elements including: moving obstacles (e.g., pedestrians, bicyclists, and cars), changing lighting, varying viewpoint, seasonal and weather changes (e.g., falling leaves and snow), and long-term structural changes caused by construction projects
University of Michigan Ford Campus Vision and Lidar Data Set - Dataset collected by an autonomous ground vehicle testbed, based upon a modified Ford F-250 pickup truck. The vehicle is outfitted with a professional (Applanix POS LV) and consumer (Xsens MTI-G) Inertial Measuring Unit (IMU), a Velodyne 3D-lidar scanner, two push-broom forward looking Riegl lidars, and a Point Grey Ladybug3 omnidirectional camera system. Here we present the time-registered data from these sensors mounted on the vehicle, collected while driving the vehicle around the Ford Research campus and downtown Dearborn, Michigan during November-December 2009. The vehicle path trajectory in these datasets contain several large and small-scale loop closures, which should be useful for testing various state of the art computer vision and SLAM (Simultaneous Localization and Mapping) algorithms. The size of the dataset is huge (~100 GB) so make sure that you have sufficient bandwidth before downloading the dataset.
DIPLECS Autonomous Driving Datasets (2015) - The dataset was recorded by placing a HD camera in a car driving around the Surrey countryside. The dataset contains about 30 minutes of driving. The video is 1920x1080 in colour, encoded using H.264 codec. Steering is estimated by tracking markers on the steering wheel. The car's speed is estimated from OCR the car's speedometer (but the accuracy of the method is not guaranteed).
Comma.ai - 7 and a quarter hours of largely highway driving. Consists of 10 videos clips of variable size recorded at 20 Hz with a camera mounted on the windshield of an Acura ILX 2016. In parallel to the videos, also recorded some measurements such as car's speed, acceleration, steering angle, GPS coordinates, gyroscope angles. These measurements are transformed into a uniform 100 Hz time base. color and grayscale stereo cameras, a Velodyne 3D laser scanner and a high-precision GPS/IMU inertial navigation system.
Automated Synchronization of Driving Data: Video, Audio, Telemetry, and Accelerometer - 1,000+ hours of multi-sensor driving datasets collected at AgeLab(Lex Fridman).
Traffic Sign Recognition - A large dataset with traffic sign annotations, thousands of physically distinct traffic signs.
LISA: Laboratory for Intelligent & Safe Automobiles, UC San Diego Datasets - traffic sign, vehicles detection, traffic lights, trajectory patterns.
BDD100K: A Diverse Driving Video Database with Scalable Annotation Tooling - Datasets drive vision progress and autonomous driving is a critical vision application, yet existing driving datasets are impoverished in terms of visual content. Driving imagery is becoming plentiful, but annotation is slow and expensive, as annotation tools have not kept pace with the flood of data. Our first contribution is the design and implementation of a scalable annotation system that can provide a comprehensive set of image labels for large-scale driving datasets. Our second contribution is a new driving dataset, facilitated by our tooling, which is an order of magnitude larger than previous efforts, and is comprised of over 100K videos with diverse kinds of annotations including image level tagging, object bounding boxes, drivable areas, lane markings, and full-frame instance segmentation. The dataset possesses geographic, environmental, and weather diversity, which is useful for training models so that they are less likely to be surprised by new conditions. The dataset can be requested at this http URL.
Virtual KITTI - Virtual KITTI is a photo-realistic synthetic video dataset designed to learn and evaluate computer vision models for several video understanding tasks: object detection and multi-object tracking, scene-level and instance-level semantic segmentation, optical flow, and depth estimation. Virtual KITTI contains 50 high-resolution monocular videos (21,260 frames) generated from five different virtual worlds in urban settings under different imaging and weather conditions. These worlds were created using the Unity game engine and a novel real-to-virtual cloning method. These photo-realistic synthetic videos are automatically, exactly, and fully annotated for 2D and 3D multi-object tracking and at the pixel level with category, instance, flow, and depth labels.
Bosch Small Traffic Lights Dataset - Bosch Small Traffic Lights Dataset, an accurate dataset for vision-based traffic light detection. Vision-only based traffic light detection and tracking is a vital step on the way to fully automated driving in urban environments. We hope that this dataset allows for easy testing of objection detection approaches, especially for small objects in larger images.
Belgium Traffic Sign Dataset - Dataset for Belgium Traffic Sign Classification, Detection.
Traffic Light in South Korea - In contrast to Europe and the USA, most TLs for vehicles in South Korea at intersections have a horizontal layout and are installed as side-pillar horizontal types. A TL can have three or four signals and one signal consists of a 355 mm x 355 mm black box with colored bulbs. The diameter of the bulb is 300 mm. There are two types of bulbs: a circle and an arrow. The circle bulb indicates green, red, and yellow, whereas the arrow bulb represents a left turn. There are two combinations for the three bulb TL, and there is one type for the four bulb TL. The TL status can be green, yellow, red, green + left turn, and red + left turn.

Books

Free Online Books

Videos 视频教程

Talks 演讲
- Deep Learning, Self-Taught Learning and Unsupervised Feature Learning By Andrew Ng
- Recent Developments in Deep Learning By Geoff Hinton
- The Unreasonable Effectiveness of Deep Learning by Yann LeCun
- Deep Learning of Representations by Yoshua bengio
- ComputerVisionFoundation Video
Computer Vision Lecture 计算机视觉
Vision Based on Deep Learning 基于深度学习的计算机视觉
- [Stanford] CS231n: Convolutional Neural Networks for Visual Recognition
- [CUHK] ELEG 5040: Advanced Topics in Signal Processing(Introduction to Deep Learning)
More Deep Learning 更多
- [Oxford] Deep Learning by Prof. Nando de Freitas
- [NYU] Deep Learning by Prof. Yann LeCun

Software

ROS 机器人操作系统

ROS:he Robot Operating System (ROS) is a set of software libraries and tools that help you build robot applications. From drivers to state-of-the-art algorithms, and with powerful developer tools, ROS has what you need for your next robotics project. And it's all open source. [Web]

Framework 深度学习框架

Tensorflow: An open source software library for numerical computation using data flow graph by Google [Web]
PyTorch: Deep learning library in Python, used by Facebook [Web]
Torch7: Deep learning library in Lua, used by Facebook and Google Deepmind [Web]
- Torch-based deep learning libraries: [torchnet],
Caffe: Deep learning framework by the BVLC [Web]
Caffe2: Deep learning framework by the Facebook [Web]
MXNet: A flexible and efficient deep learning library for heterogeneous distributed systems with multi-language support [Web]
Keras: The Python Deep Learning library [Web]
CNTK : The Microsoft Cognitive Toolkit [Web]
Chainer : Python-based deep learning framework for neural networks that is designed by the run strategy [Web]
Theano: Mathematical library in Python, maintained by LISA lab [Web]
- Theano-based deep learning libraries: [Pylearn2], [Blocks], [Keras], [Lasagne]
MatConvNet: CNNs for MATLAB [Web]

Conference 会议

[CVPR 2018 Main Conferece][Web]
[CVPR 2018 Tutorial][Web]
[CVPR 2018 Workshop][Web]
[ICML IJCAI 2018][Web]

自动驾驶学习手册

Table of Contents目录

Papers

Overall

Classification分类

2D Object Detection 2D目标检测

3D Object Detection 3D目标检测

Object Tracking 目标追踪

Semantic Segmentation 语义分割

Depth Estimation 深度评估

Localization and Mapping 定位与导航

Visual Odometry 视觉测距

Lane Detection 车道线检测

Decision Making 决策算法

Planning 路径规划

Control 控制

RL in Autonomous Driving 强化学习在自动驾驶中的应用

Dataset 数据集

Books

Videos 视频教程

Software

ROS 机器人操作系统

Framework 深度学习框架

Conference 会议

猜你喜欢