Development document of traffic light detection based on Faster-RCNN

0 Preface

This is a small practice of traffic light detection that has been done before, and is now organized as follows in the form of "software engineering documents". It's just an overall process, it won't cover everything, it will introduce too many details, friends who have questions, welcome to communicate, and also ask God to enlighten me.

1 Requirements Specifications

In unmanned driving, the identification of traffic lights at intersections is very critical. Autonomous vehicles take different measures according to the recognition results. For example, if a red light is detected, it will wait at the intersection until the green light turns on and continue driving; if it detects a red light, it will continue to drive. At the green light, go straight through the intersection. Therefore, the ability to accurately identify the status of traffic lights determines the safety of driverless cars.
In view of the above requirements, it is necessary to develop a set of traffic light recognition solutions, collect images through cameras, then design algorithms to process the images, identify the category of the images, and finally send the classification results to the decision-making section for judgment, and then control the driverless car. the behavior of.

2 System Design

2.1 Development environment

Hardware Development Platform: NVIDIA Jetson TX2
Industrial Camera: AVT GigE
Operating System: Ubuntu 16.04
Development Platform: ROS
Programming Language: Python

2.2 Overall Design

The red street light recognition system mainly includes three parts: camera data collection, traffic light recognition, and sending of recognition results. The overall framework is shown in the figure below.

2.1 Data collection

First, install the Ubuntu16.04 operating system on the hardware development platform NVIDIA Jetson TX2, then install ROS (Robot Operating System), and finally install the driver package of AVT GigE, and then the images captured by the camera can be read.

2.2 Traffic light detection

Deep learning has achieved great success in the field of computer vision, in which the Faster-RCNN framework has the advantages of high accuracy, so this design uses the Faster-RCNN framework to recognize images.
The Faster-RCNN framework is shown in the figure below. The main steps are: input image, generation of candidate regions, feature extraction, classification, and location refinement.


The traffic light detection part mainly includes four parts: establishing data set, labeling, training model, and testing effect. The flow chart is shown in the following figure.

First of all, a data set must be established. The source is a large number of images collected at the intersection at different times and under different weather conditions, as well as offline data provided by the competition. As shown in the figure below, an image collected at the intersection is shown.

Then use the tool to label the dataset, which is divided into three categories: background, red light, and green light.
Configure the deep learning framework Caffe on NVIDIA Jetson TX2, then configure the Faster-RCNN framework, modify the parameters of some specified files, and train the model using the data set created by yourself.
After the model training is completed, input the image shown in the figure above for testing. The test result is shown in the figure below. The program detects the green light and its position and probability.

2.3 Send the recognition result

In this design, the communication between different modules is completed with ROS. The traffic light recognition module publishes the test results to a topic, and the decision-making module subscribes to this topic, so that the decision-making module obtains the detection results of the traffic light recognition module. Communication between modules is shown in the figure below.


Among them, talker it is the node of the traffic light recognition module, the node listener of the decision-making module, the topic published by the node, light_state and talker the listener node subscribed to the topic light_state .

3 programming implementation

Since the images collected by the camera need to be processed in real time, ROS or OpenCV is used to obtain each frame of the input image, and then the image is input to the traffic light recognition program, and the recognition result is output. Finally, the result is displayed in the form of an interface, and the video and It is saved in the form of pictures, which is convenient for debugging. Here you need to use OpenCV to operate on the file.
Use ROS to encapsulate the entire traffic light recognition module into a node, and the detection result is a topic, and it is published for other nodes to subscribe.
The program flow is shown in the figure below.

4 Integration

Integration includes the integration of internal modules and integration with other external modules.
The integration of internal modules mainly includes real-time acquisition of images, input to the detection module for processing, then save the results for debugging, and finally send the detection results.
The integration with external modules, that is, the decision module, is done through ROS. The traffic light recognition module and the decision module are two nodes respectively. The traffic light recognition node publishes a topic with detection results, and the decision node subscribes to this topic to obtain the detection results.

5 test

The test is divided into offline test and online test.
Offline test is to build the environment, train the model, program implementation, test results, mainly the preparation process before the online test.
Online testing requires real vehicle testing in real scenarios to verify the effectiveness of the solution.
At the time of writing this blog, the hardware device has been handed in, and no screenshots were left at the beginning, so I had to release the screenshots running on the laptop.
(This also reminds me to record the experiment as soon as possible)


I wanted to insert a video to show the effect intuitively, but I tried a lot of methods and couldn't insert a video in the Markdown format, so I attached a link, and I also ask friends who can insert videos in Markdown format to give me more advice.
http://player.youku.com/player.php/sid/XMzU0MjQ4MzA0NA==/v.swf

6 Maintenance

This solution is mainly aimed at the identification of traffic lights at intersections at specific locations. If the scene changes, the data set must be re-collected for training.
The current version is highly dependent on GPU, and there is a problem of low real-time performance. Later, the model will be optimized, and a framework with better performance will be used or developed for model training.

Appendix 1 Program source code

For some reason, the source code is not open for the time being. Friends with special needs, please comment and welcome to communicate.

Appendix 2 Hardware Equipment Diagram


Now that I don't have an AVT GigE camera on hand, let's get a micro camera to the rescue.

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324517657&siteId=291194637