Autonomous driving simulation science article four: virtual mileage, time acceleration and large-scale concurrency

Exchange group |  Enter "sensor group/skateboard chassis group", please add WeChat ID: xsh041388

Exchange group |  Enter the "Automotive Basic Software Group", please add WeChat account: Faye_chloe

Remarks: group name  + real name, company, position

This article is the fourth article in the popular science series of automatic driving simulation. Click on the following titles to jump to the first three articles in this series :

" Autonomous Driving Science Article One: Scene Source, Scene Generalization and Extraction "

" Autonomous driving simulation science article 2: Where is the difficulty in sensor simulation?" "

" Autonomous driving simulation popular science article 3: What is the difference between lower and higher level autonomous driving simulation tests? "

Author | Su Qingtao

one. How to understand the "thousands of kilometers per day" in the simulation?

Similar to the real road test, some simulation companies also emphasize "driving mileage", for example, "hundreds of thousands of kilometers per day", so what is the real meaning behind this number? How does it compare to mileage on real roads?

The virtual mileage refers to the sum of the mileage of a massive simulation platform in parallel simulation nodes per unit time. The simulation mileage per unit time depends on the number of nodes supported by the computing power of the entire platform and the super real-time index under different simulation scene complexity.

To put it simply, a simulation node is a vehicle, that is, how many "test vehicles" the simulation platform can support running in parallel at the same time.

According to An Hongwei, CEO of Zhixing Zhongwei, explained : To put it simply, if a simulation platform has the computing power of 100 GPU servers, and each deploys 8 simulation instances, then the simulation platform has the ability to parallelize 800 simulations at the same time. The simulation mileage depends on the daily mileage of each instance .

How many instances can run on a GPU server depends on the performance of the GPU and whether the simulation solver can be simulated in parallel on a server .

An Hongwei said: "The simulation nodes of our cloud simulation platform have realized a variety of deployment methods, which can flexibly meet the conditions of various cloud resources of customers, and can achieve large-scale and flexible node deployment. Currently we are building in Xiangcheng, Suzhou. Its cloud simulation platform has achieved the deployment of more than 400 nodes."

Combined with the daily mileage of each instance, the total daily simulation mileage on the simulation platform can be roughly calculated. If one instance (virtual car) runs an average of 120 kilometers per hour and runs 24 hours a day, then it is nearly 3,000 kilometers per day. If there are 33 instances, then there are almost 100,000 kilometers per day on this server.

However, according to An Hongwei, the simulation "thousands of thousands of kilometers per day" that the industry usually refers to is not very rigorous. " It needs to be supported by a reasonable simulation test plan and a large number of scenarios, and the coverage and effectiveness of the scenarios should be continuously expanded. Finally, the effective scenarios that can be run out are fundamental. "

2. Super real-time simulation

During the interview, the author repeatedly asked a question: Are the cars running on the simulation platform in the same time dimension as the cars in the real world? Put another way: Is 1 hour on the simulation platform equal to 1 hour in the real world? Will there be a situation of "one year on earth, ten years in heaven"?

The answer is: it can be equal to (real-time simulation), or not equal to (super real-time simulation). Ultra-real-time simulation can be divided into two cases of " time acceleration " and " time deceleration " - time acceleration means that the time on the simulation platform is faster than the time in the real world, and time deceleration means that the time on the simulation platform is slower than the real world.

Simulation is faster than real-world time for efficiency, so why is it slower than real- world time ?

An Hongwei's explanation is: "For example, some simulation tests require very high accuracy in image rendering. In order to pursue accuracy, the rendering of a single frame image may not be completed in real time. This kind of simulation that is slower than real time, Instead of doing real-time closed-loop testing, it is doing offline testing.” 

Specifically, in real-time simulation, after the image is generated, it is directly sent to the algorithm for recognition. This process may be completed within 100 milliseconds, but in offline simulation, the image is saved first after generation, and sent to the algorithm under offline conditions. deal with.

According to An Hongwei's explanation, the following two prerequisites need to be met for ultra-real-time simulation on the simulation platform: the computing resources of the server are sufficiently powerful; the algorithm under test can receive virtual time.

Algorithms can accept virtual time, how do you understand this? An Hongwei's explanation is that some algorithms may need to read the time service on the hardware or the network time service under the condition of combining the hardware running platform, but cannot read the virtual time provided by the simulation system.

A Tier1 simulation expert said: Accurate time alignment and synchronization can be achieved in the engineering framework of the simulation system, PoseidonOS, and then the algorithm can be deployed on cluster servers, so that the time in the simulation space can be decoupled from the time in the real physical world. Once you untie it, you can "accelerate at will".

So, when doing time acceleration, can it be accelerated by 2 times or 3 times, what does this acceleration factor depend on?

An Hongwei's answer is: the server's computing resources, the complexity of the test scenario, the complexity of the algorithm, and the operating efficiency of the algorithm. That is to say, in theory, under the conditions of the same scene complexity and the same algorithm, the more powerful the computing resources of the server, the more possible acceleration times can be achieved .

What is the upper limit of the time acceleration multiple? We have to combine the principle of time acceleration to answer this question.

According to the person in charge of the simulation of an autonomous driving company, due to the inconsistency of the algorithm complexity and other reasons, the calculation speed of the training module, the vehicle control module and other modules is different, and the most conventional method of super real-time is to use the calculation of each module involved in the calculation. Do unified scheduling. The so-called acceleration means that the module with a faster calculation speed "cancels the waiting time"-no matter if you have not finished calculating another module, I will synchronize when the time is up.

If the difference in calculation period between modules is too small, the waiting time for cancellation is very small, so the acceleration factor will be very low; on the other hand, if the difference in calculation period of each module is particularly large, for example, it takes 1 second, And the other one takes 100 seconds, so there is no way to "cancel the wait".

Therefore, the multiple of time acceleration is often limited - 2-3 times is considered very high.

Even, many experts said that in practice, it is difficult to really speed up time .

Yang Zijiang, the founder of Shenxin Kechuang, said that if the algorithm of the automatic driving system has been compiled and deployed to the domain controller or industrial computer (this is the case in the HIL stage), it can only run in real time in the simulation system—— At this time, super real-time simulation is not feasible.

An Hongwei also said: "Hardware-in-the-loop (HIL, hardware-in-the-loop simulation) itself must be a real-time simulation. There is no concept of 'super real-time', and the terms 'parallel simulation' or 'time acceleration' are not applicable."

Bao Shiqiang said: " The premise of time acceleration is precise control of time and time synchronization . It is difficult to accelerate perception because the frequencies of different sensors are different. The camera may be 30 Hz, and the lidar is 10 Hz, similar to this, How do you ensure that the signals from different sensors can be strongly synchronized?"

In addition, a simulation expert who has worked in Silicon Valley for many years believes that no company can truly achieve ultra-real-time simulation. In the opinion of this expert, to improve simulation efficiency, massively parallel simulation is a more desirable solution.

An Hongwei believes that the time acceleration capability depends on the super real-time level of each instance, the total number of instances and the quality of the scene. "Actually, for cloud computing power simulation, the ultra-real-time level on a single instance is not very important. The core is to focus on the quality of the simulation on this instance ."

Qingzhou Zhihang simulation experts even believe that the term "acceleration multiple" is actually not true . Because, between the time in the simulation and the time in the real world, there is not a simple multiple relationship, they don't even have a relationship. In practice, more technical means are used to reduce the occupation of computing power and improve the efficiency of timing scheduling to achieve the improvement of computing time.

In the real road test, the vehicle drives continuously. You would not say that this is a corner case. I will run it. It is not a corner case. corner case; on the simulation platform, engineers usually only capture those fragments related to the corner case (that is, "effective scenes"). After processing this matter, the clock will jump to the next time period without the need to Waste of time on the scene.

Therefore, when doing simulation, how to efficiently screen out effective scenarios is more important than the time acceleration factor.

Speaking of this, we can find that although the acceleration of time does not seem to be obvious, but to increase the virtual mileage on the simulation platform, in fact, we cannot mainly rely on the acceleration of time. The key is to rely on "multi-instance concurrency", which is actually to Do cloud computing power simulation and increase the number of servers and simulation instances .

three. Large scale concurrent testing

Can it support high concurrency in the cloud, and how large a scale of concurrency is supported? Where is the difficulty? Is it enough to just rely on heap servers?

Sounds right, but the problem is that every order of magnitude increase in the size of the server brings new problems -

(1) The cost of servers is quite high. Each server is hundreds of thousands. If there are 100 servers, the direct cost is tens of millions. The ideal solution is to go to the public cloud, but domestic OEMs still need to accept the public cloud. a period of time;

(2) In the case of large-scale concurrency, the raw data of the sensor is extremely large. The storage cost of these data is very high, and the transmission is also difficult - the synchronization of data on different servers will cause delays, which will affect the efficiency of Zhixing ;

(3) What runs on each road is not a continuous traffic flow scene, but a very short segment, maybe only 30 seconds, but usually thousands of roads run in parallel, if 1,000 roads have 1,000 algorithms running on 1,000 scenes , which poses a serious challenge to the architecture design of the simulation platform. (CEO of a simulation company)

However, regarding the last item above, An Hongwei said: This is a basic requirement for cloud computing power simulation, and it is not a challenge for us. The cloud simulation platform in Xiangcheng District, Suzhou has solved this problem as early as 2019. In addition, the scenes run on the cloud simulation platform will also have several kilometers of continuous complex/combined scenes. Xiangcheng's Robo-X simulation evaluation system includes such (group) scenes. Based on such scenarios, a "takeover" test under virtual simulation can be carried out.

Jiuzhang Zhijia recently launched the " 4D Millimeter Wave Radar Special Report "——

f7aadadca0e8b5675f7b2af925f4f39c.png

251e5e0f51ab37b3eb805ef8687ef5e5.png

3eb7b6dfae248e51f0b0df7436127669.png

f5315b70c6cecd905179d945782aa8ff.png

d588aaa1484e94d9560f9038edde2f04.png

2023 report topic selection schedule (a total of 6 copies throughout the year)

  1. Discrimination and analysis of the necessity of full-stack self-development of autonomous driving

  2. Layout, planning and development trend of smart car vehicle architecture/domain controller

  3. Laser radar technology route and technology trend

  4. Topics on Smart Car Software Development

  5. Analysis on the Development Trend of Vehicle Camera Core Technology

  6. Challenges and solutions for mass production of autonomous driving products

how to buy

Nine-chapter special reports are priced at  10,000  yuan/article

The first trial issue report——

" 4D Millimeter Wave Radar Technology Trend and Application Analysis "

Only 999  yuan / article 

7885503cf45dddaf3f56d944fc3423a0.png

If you need to purchase a report, please scan the QR code to add staff

write at the end

About Contribution

If you are interested in contributing to "Nine Chapters Smart Driving" ("knowledge accumulation and sorting" type articles), please scan the QR code on the right and add staff WeChat.

12a1b770b50ea998d59f8610e46e289a.jpeg

Note: Be sure to note your real name, company, and current position when adding WeChat

And the information about the position of interest, thank you!


Quality requirements for "knowledge accumulation" manuscripts:

A: The information density is higher than most reports of most brokerages, and not lower than the average level of "Nine Chapters Smart Driving";

B: Information needs to be highly scarce, and more than 80% of the information needs to be invisible on other media. If it is based on public information, it needs to have a particularly powerful and exclusive point of view. Thank you for your understanding and support.

Recommended reading:

Nine chapters - a collection of articles in 2022

Dedicated to "the first city of autonomous driving" - a "moving the capital"

The 4D millimeter-wave radar is clearly explained in the long text of 4D

Application of deep learning algorithm in automatic driving regulation and control

Challenges and dawn of wire control shifting to mass production and commercial use

"Be greedy when others are fearful", this fund will increase investment in the "Automatic Driving Winter"

Guess you like

Origin blog.csdn.net/jiuzhang_0402/article/details/128556916