ADS software and hardware analysis and application of DRL in torcs

Analysis and application example of DRL in ADS

 

Deep learning has achieved outstanding results in many aspects such as image recognition and semantic analysis, and has also promoted the development of more fields. Reinforcement learning has developed into a new term: DRL through the approximation of neural networks. Based on the introduction of RL and DRL in the previous two articles, this paper focuses on the application scenarios of wu deep reinforcement learning in autonomous driving simulation systems.

1. Automotive platform and principle analysis

The control platform is the core component of the unmanned vehicle, which controls various control systems of the vehicle, including the anti-lock braking system (ABS), the anti-spin system (ASR), the electronic stability program (ESP), the electronic induction Brake Control System (SBC), Electronic Brake Force Distribution (EBD), Brake Assist System (BAS), Airbag (SRS) and Automotive Radar Anti-collision System, Electronically Controlled Automatic Transmission (EAT), Continuously Variable Transmission (CVT) , Cruise Control System (CCS), Electronically Controlled Suspension (ECS), Electronically Controlled Power Steering (EPS) and more. The control platform mainly includes two parts: electronic control unit ECU and communication bus. (Note: For more car-related terms, it is recommended that you take a look at Autohome:

https://car.autohome.com.cn/shuyu/detail_8_10_244.html)

 

(1) ECU (Electronic Control Unit): Electronic control unit, also known as "trip computer", "vehicle computer" and so on. In terms of use, it is a special-purpose microcomputer controller for automobiles. Like ordinary computers, it consists of a large-scale integration of microprocessor (CPU), memory (ROM, RAM), input/output interface (I/O), analog-to-digital converter (A/D), and shaping and driving. circuit composition. In a simple sentence, it is "ECU is the brain of the car", which is mainly used to implement control algorithms.

(2) Communication bus (CAN: Control Area Network): It is actually a control area network. It is a standard bus developed by BOSCH in Germany for computer control systems and embedded industrial control area networks. It mainly implements ECU and machinery in the automotive field. A series of communication functions between components.

 

Well, the two core ECUs and communications of the car control platform are mentioned above, but how do we control the car through the program, do we need to step on the accelerator through the robot? hold the steering wheel? (What's the difference between this and a human drive?) Obviously, this solution is theoretically possible, but it is not the solution pursued by ADS. Usually, industrial connections use a type of drive -by-wire (Drive-by-wire, commonly known as: wire-controlled) The electronic technology is used to control the accelerator system, steering system and braking system of the car directly through the application program.

 

(1) Wire-controlled throttle:

The main function of the wire-controlled electronic accelerator is to convert the angle at which the driver steps on the accelerator pedal into a voltage signal proportional to it, and at the same time, various special positions of the accelerator pedal are made into contact switches, and the idle speed, high load, acceleration and deceleration of the engine are changed. The working condition becomes an electrical pulse signal and is sent to the controller ECU of the electronically controlled engine to achieve optimal automatic control of fuel supply, fuel injection and speed change, as shown in the figure, so we only need to manipulate the specific voltage signal through the program, which can be Attributed to:

ΔVt = αΔV

Note: Vt is equivalent to the engine speed, not the direct speed, and a lot of complex conversions are required in the middle

 

(2) Steering-by-wire system

The following figure shows the structure diagram of Infiniti. Steering-by-wire cancels the mechanical connection between the steering wheel and the wheels. The sensor obtains the steering angle data, and then the ECU (electronic computing unit) converts it into specific driving force data. Push the steering gear to turn the wheel, which is expressed as:

ΔFt = αΔθt

Note: The relationship between the driving force and the turning angle also passes through complex changes in the middle.

 

 

(3) Brake-by-wire system

There are many related terms in the traditional braking system, including ABS, ESP, etc. For the same reason, there is a corresponding functional relationship between the angle of the brake pad and the braking, and the cooperation of many sensor devices such as a booster pump is required in the middle. Finish. This article only provides an overview of the top level.

For more information on how to derive the transformation relationship between them, see [3]

 

So far, this article has briefly described the three main control elements of the car. Having said so much, it is actually to popularize the common sense of automobiles. The following will start the introduction of the equipment used in automatic driving.

2. Sensor composition and analysis

Scene of a person driving a car: Concentrating, looking straight ahead with both eyes, holding the steering wheel, feet on the brakes...

 

In fact, the principles of ADS and human driving are basically similar, so first of all, you need an eye: it is mainly used to observe the scene ahead, and both eyes can perceive the distance (in addition: ADS needs Lidar LIDAR, GPS module), the use of these two is mainly Realize high-precision positioning and distance and object detection.

Well, without further ado, it's time to see what self-driving cars really look like! [Other people's car, snickering...]

 

 

 

(1) Camera

It is mainly used to detect the environmental scene in front of the car, driving lane lines, etc., and can handle some complex situations, semantic segmentation, target detection, etc.

(2) GPS module

ADS is still very dependent on high-precision maps (three-dimensional stereo) left: The acquisition method is basically similar to the commonly used map acquisition principle (right), but it is a three-dimensional spatial map after three-dimensional modeling.

(3) Lidar

 

The main function of lidar is to detect the millimeter-level distance based on the three-dimensional map and the distance of the actual object.

After all, the roadside infrastructure is not something you can hit easily. As for the 32-line and 64-line laser radar, you can google it yourself. The specific difference lies in the accuracy and so on.

So what does the display scene under the lidar look like? In fact, it just displays the three-dimensional frame entity, it doesn't know what it is, but it can display the distance (suggestion: you can do human body modeling, identify people, and improve the safety level. After all, in terms of priority, it is better to hit a telephone pole, not hitting pedestrians)

 

Next, we need to make a connection between the car and the sensor, and speak with a picture:

Now you need a controller, the common one is (Industrial PC (IPC), but it is slightly expensive, it is best to crack the car system, it is not easy! (Of course, apollo can be used)

So so is a long way to go...  

3. Car and sensor combination analysis

ADS system is a very complex engineering system, which mainly includes three layers: perception, planning and control.

Perception is usually the acquisition of information about the current environment through sensors (cameras, LiDAR, GPS/IMU, radar, and sonar, etc.). Decision-making is the prediction of the control plan for the next behavior (including routing planning, behavior planning, etc.), and control is a series of actions on the car, such as how much to increase the accelerator, how hard to brake, or how much to turn. While it's all about driving safely on the road, this article focuses on making decisions.

 

Note: Since the automatic driving system is a very complex engineering system, the control should have multi-level security protection! Therefore, this article only predicts the steering angle, throttle size, etc. of the steering wheel based on the test of the simulated environment, and uses reinforcement learning to model the process and simulate it in the game environment (which is a bit disappointing for everyone, after all, the investment in the real car is a bit high, It is still difficult for civilians), and will share some more simulated application scenarios about vehicles passing through intersections separately in the future!

 

For the perception module, the camera generally uses obstacle detection and recognition, traffic light recognition, etc., and radar is mainly used for 3D map distance detection, etc. This paper focuses on using reinforcement learning to control the torcs game, and by controlling a better speed, braking relationship between multiple variables.

 

Torcs game introduction:

The Open Racing Car Simulator (TORCS) is an open source 3D racing simulation game . is a popular racing game on the Linux operating system. There are 50 vehicles and 20 tracks, simple visuals. Written in C and C++ , released under the GPL license. The torcs game used in this article has about 18 input dimensions, some of which are as follows

Problem modeling:

The scene in front is obtained through the camera, and after processing through CNN convolution, a set of strategies to control the speed is obtained through the reinforcement learning method. The overall architecture is shown in the figure:

 

4. Reinforcement Learning Algorithms

Frontier: This paper uses the Torcs game based on the gym platform for simulation, and uses the deep deterministic policy gradient algorithm for experiments. Why choose a deterministic policy gradient algorithm:

Reason: For the parameters in the game, variables such as the accelerator and the brake pad are continuous variables. Like the original DQN algorithm, the performance of continuous strategies is not particularly good. In 2015, David Sliver proved the existence of deterministic strategy gradients. Finally, DDPG is implemented using deep neural network, which has excellent performance in dealing with continuous and large state space scenarios.

The DDPG used in this paper uses the AC framework for learning and adopts a different strategy method: that is, the actor adopts a random strategy to explore the environment, while the critical network adopts a deterministic strategy for function approximation.

 

4.1 Environment Installation

Machine environment: Ubuntu16.04, tensorflow1.3, python2.7

sudo apt-get install xautomation

sudo pip install numpy

sudo pip install gym

git clone https://github.com/ugo-nama-kun/gym_torcs.git

cd gym_torcs/vtorcs-RL-color/src/modules/simu/simuv2/

sudo vimu.cpp

// annotation lines-64 and add thefollowing codes

//if(isnan((float)(car->ctrl->gear))||isinf(((float)(car->ctrl->gear)))) car->ctrl->gear = 0;

cd  gym_torcs/vtorcs-Rl-color directory

// install dependcy package

sudo apt-get install libglib2.0-devlibgl1-mesa-dev libglu1-mesa-dev freeglut3-dev libplib-dev libopenal-devlibalut-dev libxi-dev libxmu-dev libxrender-dev libxrandr-dev libpng12-dev

./configure

// make enviroement

make

sudo make install

sudo make datainstall

// install finished and start games bycommand torcs

torcs

 

4.2 Algorithm Explanation: Deep Deterministic Policy (DDPG) Algorithm

The DDPG algorithm uses two very important ideas of the DQN algorithm:

(1) Experience playback

(2) Target network

The reason why these two ideas are used is to eliminate the correlation of reinforcement learning in data collection. The pseudocode is as follows:

4.2.1 Deterministic Strategies

In the first article of the series, the concepts of deterministic strategy and random strategy have been mentioned. Let me mention it again:

(1) Deterministic strategy:

(2) Random strategy:

So what is a deterministic strategy?

 

4.2.2 The Performance-Critic (AC) Framework

The so-called AC frame is the Actor-critic frame, which is equivalent to an actor dancing on the stage, the critic below the actor is quick to make his movements good or bad, and then the actor gradually improves his movements,

In fact, different strategies are used in the algorithm, that is, Actor explores through random strategies, while Critic adopts deterministic strategies. The overall structure is shown in the figure:

Gradient of Actor's stochastic policy:

How Critic is updated:

The learning structure is as follows:

 

Critic evaluates the actions made by the Actor according to the environmental state S!

4.2.3 Reward function settings:

4.3 Code running and analysis

4.3.1 Code running

Program running environment (tf.10, keras1.1, python2.7)

pip install hard

pip install tensorflow

git clone https://github.com/yanpanlau/DDPG-Keras-Torcs.git

cd DDPG-Hard-Torcs

cp *.* ../gym_torcs

cd ../gym_torcs

pythonddpg.py

 

Effect picture:

4.3.2 Code Analysis

 

references:

[1]. http://geek.csdn.net/news/detail/199280

[2]. http://www.fzb.me/2017-9-25-driving-by-wire-tech.html

[3]. Pedal simulator and braking feel evaluation in brake by wire system

[4]. https://blog.csdn.net/gsww404/article/details/79630234

[5]. M. A. Farhan Khan,Car Racing using Reinforcement Learning[J]

[6].https://lopespm.github.io/machine_learning/2016/10/06/deep-reinforcement-learning-racing-game.html

[7]. https://yanpanlau.github.io/2016/10/11/Torcs-Keras.html

[8]. https://arxiv.org/pdf/1304.1672.pdf

[9].David sliver, Deterministic Policy Gradient Algorithms

 

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324674096&siteId=291194637